All articles
arxiv arXiv cs.CL · 3h ago

Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization

Researchers propose Psy-CoT, a psychology-grounded chain-of-thought framework that decomposes pre-response reasoning into Interaction Perception, Psychological Empathy, and Logical Construction to improve character fidelity. To address gradient misalignment in reinforcement learning, they introduce Role-Aware Policy Optimization (RAPO), which uses profile-token mutual information to weight gradients asymmetrically.

arxiv arXiv cs.CL · 4h ago

The Riddle Riddle: Testing Flexible Reasoning in Large Language Models and Humans

A study introduces the "riddle riddle" paradigm to determine whether large language models (LLMs) rely on flexible reasoning or pattern matching, revealing that humans and LLMs fail in opposite directions. In experiments involving nine state-of-the-art LLMs and 100 human participants, LLMs performed significantly worse on riddle riddles than on genuine riddles, while humans showed the reverse trend.

arxiv arXiv cs.CL · 4h ago

HarmVideoBench: Benchmarking Harmful Video Understanding in Large Multimodal Models

Researchers introduce HarmVideoBench, a multi-layered diagnostic benchmark designed to evaluate large vision-language models on their ability to understand harmful videos beyond superficial cues. The benchmark addresses limitations in existing works by incorporating explanatory rationales and assessing three hierarchical dimensions of harm: Observable Evidence, Clip-Internal Meaning, and Beyond-Clip Reasoning.