Multimodal Chain-of-Thought: Capabilities and Limitations
Multimodal Chain-of-Thought reasoning improves performance in mathematical and scientific reasoning but harms visual grounding and object counting in perception tasks. Models exhibit a 'Look Light, Think Heavy' pattern, where visual reflection diminishes while verbal reflection increases, indicating a persistent bottleneck in visual reasoning.