Recent large language models struggle with domain-specific data generation due to strict formatting and structural requirements. To address the interoperability of utility power outage reports in the United States, researchers propose POTracker, an optimized model for generating machine-readable compliance documents. The team fine-tuned Qwen2.5-7B-Instruct using a novel objective called POTrackerLoss. This new loss function accounts for both textual similarity and structural tag similarity between generated outputs and ground-truth reports. Evaluation on a dataset of 1,000 reports demonstrates that POTracker outperforms five fine-tuning methods and one rule-based XML conversion approach. The model improves overall accuracy by up to 51% and achieves 86.47% structural accuracy for the generated reports. Additionally, a human study involving domain experts assigned an average quality score of 4.03 on a 0-5 scale to the generated labels.
POTracker Optimizes LLMs for Standard-Compliant Power Outage Report Generation
from English