Training-free graph framework infers reading order in complex document layouts

A training-free, graph-based framework has been proposed to infer reading order in complex historical manuscripts, such as the Glossa Ordinaria layout where text streams are spatially interleaved. The method constructs a directed candidate-transition graph from OCR lines and recovers the global order using a max-regret inference rule to avoid greedy selection failures.

Edges are scored by a weighted ensemble of causal language model conditional likelihood and BERT next-sentence prediction (NSP).
On synthetic Glossa Ordinaria layouts, the method recovers 95% of ground-truth successor edges on average, compared to 50% for recursive XY-cut.
On a 140-page multi-column subset of OmniDocBench, it achieves 88% macro edge accuracy versus 75% for XY-cut and 25% for LayoutReader.
The approach demonstrates mirror-invariance, changing by less than 1 percentage point under page reflections, whereas LayoutReader-T changes by up to 8 points.

This framework addresses the bottleneck of digitizing complex layouts where canonical methods like XY-cut suffer from cascading failures and LayoutReader baselines transfer poorly due to granularity mismatches.