Turing-RL introduces a reinforcement learning method using an LLM judge to evaluate how indistinguishable generated responses are from real user inputs. It outperforms baseline methods in both LLM and human evaluations across chat and Reddit forum domains, demonstrating that optimizing for indistinguishability improves user simulator performance.
Turing-RL: Learning User Simulators with Turing Rewards
from English