OCR-VLMs Read Devanagari? Benchmark and Post-Correction Study

A study benchmarks ten OCR systems on Devanagari text, revealing that specialized OCR vision-language models are fragile under degradation and that strong English performance does not predict Indic script accuracy.

On clean rendered text, all ten systems cluster within chrF++ 91 to 98, but specialized models like DeepSeek-OCR suffer catastrophic repetition failures under degradation.
On real printed scans, nine of the ten systems collapse significantly, with EasyOCR dropping from chrF++ 93.6 to 58.3 and olmOCR-7B falling to 40.5.
Gemini 2.5 Flash leads at chrF++ 86.3, followed by Claude Opus 4.7 at 82.2, while the open Qwen3-VL-8B (75.2) outperforms GPT-5.5 (58.5).
A byte-level post-corrector improves performance on its own engine but does not transfer across different OCR systems.

The authors release the benchmark, code, and models to address the lack of characterization for Indic scripts in current OCR research.

Benchmark	Model	Score
GAIA	Gemini 2.5 Flash	86.3%
GAIA	Claude Opus 4.7	82.2%
GAIA	Qwen3-VL-8B	75.2%
GAIA	GPT-5.5	58.5%
GAIA	EasyOCR	58.3%
GAIA	olmOCR-7B	40.5%

Benchmarks