Researchers introduce PolicyGuard, a sub-agent verifier designed to improve policy adherence in LLM agents by reasoning over the full dialogue context rather than relying on external checks of individual arguments. This approach addresses the limitations of prior safeguarding methods that often underestimate the need for conversation-specific remediation and explicit user confirmation.
- PolicyGuard shares the agent's view of the dialogue, reasons over policies in context, and provides actionable feedback for the next turn.
- On tau^2-BENCH airline across three vendors (GPT-5.4, Claude Sonnet 4.6, Gemini 2.5 Pro), it improves PASS4 by +12.0 / +6.0 / +12.0 pp.
- Per-call analyses show higher policy-violation recall while blocking roughly half as often as argument-level guards.
This method helps users by ensuring that real-world workflows, which unfold across many turns, are handled in compliance with organizational policies through continuous dialogue grounding.