Researchers present KG2Cypher, a data-centric pipeline designed to build enterprise text-to-Cypher systems from existing Knowledge Graphs. The system generates executable Cypher queries from graph facts and uses large language models to create corresponding natural-language questions.

  • KG2Cypher constructs executable Cypher queries from observed graph facts and uses LLMs to generate associated natural-language questions.
  • Text-Cypher pairs are validated via an LLM judge and human validation, then converted into candidate-aware SFT data.
  • The trained generator utilizes class-conditioned schema prompting, entity retrieval, and LoRA-based inference.
  • In Korean enterprise settings, LoRA SFT improved execution-result F1 from 0.806 to 0.950 on broadcast-program queries and from 0.70 to 0.92 on company queries.
  • The system achieved 95.2% exact match, 99.9% execution rate, and 0.964 execution-result F1 in an 11-class setting.

This approach addresses the high cost of building natural-language interfaces for private enterprise graphs by leveraging existing data structures to improve query generation accuracy.