A new framework enables secure, probabilistic policy enforcement for AI agents in ambiguous environments. It uses distributionally robust optimization to compute rigorous upper bounds on policy violation probabilities without assuming predicate independence. The method outperforms prior approaches on terminal and tool calling agent benchmarks, improving the security-utility trade-off.
arxiv
arXiv cs.AI
·
6d ago
·
research
Efficient and Sound Probabilistic Verification for AI Agents
from English
Importance 3/3
Beats a top-lab benchmark
New harness with differentiators
arXiv cs.AI
OpenAI
Google DeepMind
Mistral AI
AI agents
Evaluation & benchmarks
Safety & alignment
Benchmarks
| Benchmark | Model | Score |
|---|---|---|
| Terminal-Bench | our approach | — |