arxiv arXiv cs.CL · 22h ago · research

L3Cube-MahaPOS: Marathi POS Tagging Dataset and BERT Models

from English

L3Cube-MahaPOS introduces a gold-standard part-of-speech tagging dataset for Marathi, manually annotated with 32,354 sentences from news text. It includes a 16-tag Universal Dependencies scheme and benchmarks six model families, achieving 88.67% token-level accuracy and 81.67% macro-F1 on 15 tag classes using MahaBERT-v2.

Importance 2/3 arXiv cs.CL Evaluation & benchmarks Research paper

Benchmarks

Benchmark	Model	Score
SWE-bench Verified	MahaBERT-v2	88.67%

Read original