Researchers propose MIThinker, a lightweight thinking model that generates therapeutic thoughts to guide Motivational Interviewing counseling agents in strategy selection and response generation. To address the lack of annotated thought data, they introduce AugR1-MI, an automated pipeline that reverse-engineers counselor's thoughts from observed responses.
- MIThinker utilizes two-stage training combining supervised fine-tuning and reinforcement learning.
- The AugR1-MI pipeline reverse-engineers counselor thoughts from observed responses to overcome data scarcity.
- MindfulMI, the agent leveraging MIThinker, achieves MI competency comparable to state-of-the-art systems.
- The system requires an order of magnitude less computation than existing solutions.
The authors consider this important because it improves theory-of-mind assessment and strategy alignment while significantly reducing computational requirements for effective counseling agents.