Researchers propose a modular pipeline for building a travel-domain reasoning large language model grounded in an expert-designed knowledge graph to address accuracy and reliability issues in specialized domains. The approach integrates a travel knowledge graph, a bottom-up construction procedure for multi-hop question-answer pairs, and supervised fine-tuning to embed domain knowledge as auditable reasoning traces.
- The pipeline uses a travel knowledge graph encoding domain entities and relationships to generate multi-hop QA pairs via a bottom-up construction procedure.
- Supervised fine-tuning is applied using the generated QA pairs as auditable reasoning traces to enhance the model's reasoning capabilities.
- Evaluation of Qwen3-4B with LoRA adaptation achieved an 82.4% exact match on a travel-domain benchmark, significantly outperforming the pretrained baseline of 22.4%.
- Calibration analysis identified two failure modes: over-confident multi-label decoding and reasoning failures where the model fails to reconstruct correct multi-hop paths despite having supporting facts.
This work confirms that explicit knowledge graph-grounded reasoning substantially improves accuracy and uncertainty interpretation in specialized domains, isolating per-option calibration and trace-length-aware decoding as key areas for future improvement.