Evaluating Japanese Dialect Robustness Across Speech and Text-based Large Language Models

This study investigates the dialectal robustness of large language models (LLMs) and speech language models (SLMs) using Japanese dialects as a test case. While LLM-based dialogue systems have advanced, dialectal variation remains a significant challenge, particularly for spoken input processing. The research defines robustness as the ratio of performance on dialectal versus standard inputs to enable fair comparisons across different model types. Experiments reveal that SLM robustness correlates directly with the robustness of their underlying text-based LLM counterparts. Additionally, the study finds that training with dialectal data and fine-tuning the speech encoder both serve to improve robustness in SLMs. These findings clarify how base LLM capabilities affect SLM performance and identify effective strategies for enhancing dialect comprehension.