MMGist: A Comprehensive Multimodal Benchmark for 2027

MMGist is a curated multimodal benchmark with 7,262 items, designed to address flaws in existing vision-language benchmarks. It reduces evaluation size by 69% and improves cross-model discrimination by 78%, while preserving model rankings with a Spearman correlation of 0.98. The benchmark highlights visual logic as a key weakness and emphasizes the importance of visual dependency, discriminative power, and reliability in evaluation.