Researchers propose YOMI-Bench, a benchmark designed to evaluate the kanji reading and phonological understanding capabilities of large language models in Japanese. The benchmark addresses the difficulty of inferring correct readings from surface-level text due to multiple possible readings per character.
- YOMI-Bench consists of four tasks specifically designed to evaluate kanji reading performance.
- The evaluation assessed one multilingual open LLM, four Japanese-specific open LLMs, and five commercial LLMs.
- Results show that even Japanese-specific models exhibit low performance in kanji reading.
- Commercial models also perform poorly on generation tasks requiring consideration of kanji readings.
The study highlights that current LLMs struggle with the linguistic characteristics of Japanese kanji, indicating a need for improved phonological understanding.