AdversaBench: Automated LLM Red-Teaming with Multi-Judge Confirmation and Cross-Model Transferability
The authors present AdversaBench, an end-to-end red-teaming pipeline that generates hard inputs for large language models using five structured mutation operators and confirms failures through a three-judge panel with a meta-judge tiebreaker.