arxiv arXiv cs.LG · 9d ago · research

PACT: Small Language Model Deliberation for Reactive Reinforcement Learning

from English

PACT combines a reactive RL policy with a 2B-parameter Small Language Model to generate and validate action plans. The SLM plan is executed directly if verified in simulation, bypassing the RL policy without retraining. PACT outperforms baselines on three increasingly difficult FrozenLake environments.

Importance 2/3 New harness with differentiators arXiv cs.LG OpenAI Google DeepMind Meta AI AI agents Reasoning models Training methods

Read original