Credence introduces Semantic-F1, a BGE-large cosine similarity metric that improves claim decomposition accuracy over Jaccard by 15-32 percentage points. It establishes convergence theorems for rule- and LLM-based repair, showing rule-based repair is finitely terminating and monotone, while LLM-based repair requires early-exit guards. Evaluations across social-media, encyclopaedic, and news domains show EPR from 0.94 to 1.00, with rule-repair reducing atomicity violations by 47-100% without fidelity loss.
Credence: Semantic Metrics and Convergence Analysis for Claim Decomposition
from English