Researchers have introduced PHANTOM, a large-scale, open-source dataset containing 47,524 pre-generated adversarial attacks designed to evaluate the safety and robustness of vision-language models (VLMs). This resource consolidates and extends prior benchmarks by covering 10 high-level categories and 55 subcategories of harmful intents, aiming to lower the computational barriers for adversarial research.
- The dataset comprises 47,524 adversarial samples generated using state-of-the-art attack strategies from recent literature.
- It covers 10 high-level categories and 55 subcategories of harmful intents, consolidating 7,826 intents from established sources.
- The resource is designed to help researchers systematically evaluate VLM robustness, fine-tune attack-generation models, and stress-test defensive guardrails.
By releasing this comprehensive dataset, the authors aim to foster more reproducible and comparable evaluations of VLM safety while making adversarial data accessible to the broader research community.