L3Cube-MahaPOS introduces a gold-standard part-of-speech tagging dataset for Marathi, manually annotated with 32,354 sentences from news text. It includes a 16-tag Universal Dependencies scheme and benchmarks six model families, achieving 88.67% token-level accuracy and 81.67% macro-F1 on 15 tag classes using MahaBERT-v2.
L3Cube-MahaPOS: Marathi POS Tagging Dataset and BERT Models
from English