YOMI-Bench: A Benchmark for Evaluating Kanji Reading and Phonological Understanding of LLMs for Japanese

Researchers propose YOMI-Bench, a benchmark designed to evaluate the kanji reading and phonological understanding capabilities of large language models in Japanese. The benchmark addresses the difficulty of inferring correct readings from surface-level text due to multiple possible readings per character.

YOMI-Bench consists of four tasks specifically designed to evaluate kanji reading performance.
The evaluation assessed one multilingual open LLM, four Japanese-specific open LLMs, and five commercial LLMs.
Results show that even Japanese-specific models exhibit low performance in kanji reading.
Commercial models also perform poorly on generation tasks requiring consideration of kanji readings.

The study highlights that current LLMs struggle with the linguistic characteristics of Japanese kanji, indicating a need for improved phonological understanding.