DMV-Bench: Diagnosing Long-Horizon Multimodal Agents' Visual Memory with Incidental Cue Injection
Researchers introduce DMV-Bench, the first interactive benchmark designed to evaluate visual memory in multimodal agents within controlled environments. The study proposes DualMem, a parallel visual and verbal memory architecture that significantly outperforms existing systems on this new diagnostic tool.