All articles
arxiv arXiv cs.CL · 6h ago

HarmVideoBench: Benchmarking Harmful Video Understanding in Large Multimodal Models

Researchers introduce HarmVideoBench, a multi-layered diagnostic benchmark designed to evaluate large vision-language models on their ability to understand harmful videos beyond superficial cues. The benchmark addresses limitations in existing works by incorporating explanatory rationales and assessing three hierarchical dimensions of harm: Observable Evidence, Clip-Internal Meaning, and Beyond-Clip Reasoning.