We introduce Precision-Cache Hit Ratio (P-CHR) AUC and Calibration Retention Rate (CRR) to address the calibration gap in semantic caching. These metrics evaluate precision across cache utilization levels and measure how offline ranking quality persists in deployment. Our analysis shows the gap is driven by training objectives, not data scale, and post-hoc calibration only partially resolves it.