PACT combines a reactive RL policy with a 2B-parameter Small Language Model to generate and validate action plans. The SLM plan is executed directly if verified in simulation, bypassing the RL policy without retraining. PACT outperforms baselines on three increasingly difficult FrozenLake environments.
PACT: Small Language Model Deliberation for Reactive Reinforcement Learning
from English