Researchers address the challenge of multi-image implicit toxicity (MIIT), where harmful semantics emerge only when benign images are interpreted jointly. They introduce MIIT-dataset, a new image-only safety dataset covering seven risk categories, and train MiShield to identify these hazards.

  • The team constructs MIIT-dataset using an automatic generation pipeline to cover seven representative risk categories.
  • MiShield is trained with progressively distilled reasoning supervision to produce explicit analyses of correlated entities causing hazards.
  • Experiments show that MiShield-8B models outperform representative moderation services and larger-scale models.

This work provides a practical solution for identifying implicit toxicity in multi-image formats, which existing commercial APIs often miss due to a lack of explicit risky cues.