PACT: Small Language Model Deliberation for Reactive Reinforcement Learning
PACT combines a reactive RL policy with a 2B-parameter Small Language Model to generate and validate action plans. The SLM plan is executed directly if verified as safe, feasible, and complete, bypassing the RL policy. PACT outperforms baselines on three increasingly difficult FrozenLake environments.