MiShield detects multi-image implicit toxicity by analyzing correlated entities

Researchers address the challenge of multi-image implicit toxicity (MIIT), where harmful semantics emerge only when benign images are interpreted jointly. They introduce MIIT-dataset, a new image-only safety dataset covering seven risk categories, and train MiShield to identify these hazards.

The team constructs MIIT-dataset using an automatic generation pipeline to cover seven representative risk categories.
MiShield is trained with progressively distilled reasoning supervision to produce explicit analyses of correlated entities causing hazards.
Experiments show that MiShield-8B models outperform representative moderation services and larger-scale models.

This work provides a practical solution for identifying implicit toxicity in multi-image formats, which existing commercial APIs often miss due to a lack of explicit risky cues.