The authors introduce PHANTOM, a large-scale open-source dataset containing 47,524 pre-generated adversarial attacks designed to evaluate the safety and robustness of vision-language models (VLMs). This resource consolidates existing benchmarks and extends them with new categories to provide diverse and practical evaluation data for the research community.

  • The dataset covers 10 high-level categories and 55 subcategories of harmful intents, totaling 7,826 distinct intents.
  • It includes 47,524 adversarial samples generated using state-of-the-art attack strategies from recent literature.
  • PHANTOM aims to lower the computational barriers for researchers by providing pre-generated data rather than requiring new generation.
  • The resource supports systematic evaluation of VLM robustness, fine-tuning of attack-generation models, and stress-testing defensive guardrails.

By releasing this comprehensive dataset, the authors aim to foster more reproducible, comparable, and extensive evaluations of VLM safety while enabling practitioners to develop effective defensive measures under diverse adversarial conditions.