arxiv arXiv cs.CL · 7d ago · research

Turing-RL: Learning User Simulators with Turing Rewards

from English

Turing-RL introduces a reinforcement learning method using an LLM judge to evaluate how indistinguishable generated responses are from real user inputs. It outperforms baseline methods in both LLM and human evaluations across chat and Reddit forum domains, demonstrating that optimizing for indistinguishability improves user simulator performance.

Importance 3/3 New feature vs. leaders New harness with differentiators arXiv cs.CL OpenAI Anthropic Google DeepMind AI agents Evaluation & benchmarks Reasoning models

Read original