Text2DSL: LLM-Based Code Generation for Domain-Specific Languages

This paper introduces Text2DSL, a distinct task of generating domain-specific language code from natural language. Using the PolkitBench dataset of 4,204 validated pairs, it shows that structured context—such as BNF grammar and API specs—boosts syntactic and structural validity and CodeBLEU scores by 60% to 95% across different LLM models, without fine-tuning.