Empirical Study of Medical LLM Adaptation in French QA

A study compares continual pretraining (CPT), supervised fine-tuning (SFT), and their combination for French medical QA. CPT+SFT performs best in multiple-choice QA, though gains over SFT are minimal and often insignificant, making SFT a cost-effective default. For open-ended QA, CPT improves metrics while SFT degrades generation quality, with instruction tuning and CPT+SFT favored by LLM-based evaluations. Cross-lingual results show effective transfer from French to English benchmarks.