A Pāninian Foundation for Indic Language Processing

The article argues that natural language processing infrastructure for the billion-plus speakers of Indic languages is fragmented due to a lack of shared structural foundations. It proposes leveraging the morphosyntactic architecture formalized in Pānini's Astādhyāyī as a unifying computational framework to improve accuracy and data efficiency.

The current field organizes tools around individual languages, overlooking the deep regularity shared across Indic languages through Sanskrit convergence.
A Pāninian framework can merge disparate resources into a single high-resource metalanguage bedrock.
The authors propose a four-part benchmark suite to render this shared architecture explicit and measurable.
The research raises questions about whether neural models trained on these languages independently represent Pānini's categories.

This approach aims to make Indic language systems more transferable and data-efficient by providing a unified computational architecture that the field has previously lacked.