Researchers propose YOMI-Bench, a benchmark designed to evaluate the kanji reading and phonological understanding capabilities of large language models in Japanese. The benchmark addresses the difficulty of inferring correct readings from surface-level text due to multiple possible readings per character.

  • YOMI-Bench consists of four tasks specifically designed to evaluate kanji reading performance.
  • The evaluation assessed one multilingual open LLM, four Japanese-specific open LLMs, and five commercial LLMs.
  • Results show that even Japanese-specific models exhibit low performance in kanji reading.
  • Commercial models also perform poorly on generation tasks requiring consideration of kanji readings.

The study highlights that current LLMs struggle with the linguistic characteristics of Japanese kanji, indicating a need for improved phonological understanding.