Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization

Researchers propose Psy-CoT, a psychology-grounded chain-of-thought framework that decomposes pre-response reasoning into Interaction Perception, Psychological Empathy, and Logical Construction to improve character fidelity. To address gradient misalignment in reinforcement learning, they introduce Role-Aware Policy Optimization (RAPO), which uses profile-token mutual information to weight gradients asymmetrically.

Psy-CoT forces models to think dynamically from profiles rather than mimicking surface patterns through three specific reasoning steps.
RAPO amplifies role-specific tokens under positive advantage and attenuates them under negative advantage to prevent reward hacking.
Experiments on CoSER, CharacterBench, and CharacterEval show Psy-CoT outperforms existing role-playing CoT methods.
RAPO consistently surpasses GRPO across multiple model scales in the reported evaluations.

The authors consider this important because it addresses the poor out-of-distribution generalization of supervised fine-tuning and the accumulation of reward hacking in LLM-based reward models, leading to more faithful character portrayal.