HarmVideoBench: Benchmarking Harmful Video Understanding in Large Multimodal Models
Researchers introduce HarmVideoBench, a multi-layered diagnostic benchmark designed to evaluate large vision-language models on their ability to understand harmful videos beyond superficial cues. The benchmark addresses limitations in existing works by incorporating explanatory rationales and assessing three hierarchical dimensions of harm: Observable Evidence, Clip-Internal Meaning, and Beyond-Clip Reasoning.