The authors introduce SrDetection, a unified framework for detecting data leakage in code large language models that operates in both gray-box and black-box settings. The method generates semantically equivalent variants of benchmark samples to identify cases where the original data is disproportionately easier for the model due to pre-training exposure.
- SrDetection contrasts model behavior on original samples against generated variants to flag leakage without relying on proprietary training corpora or brittle heuristics.
- The framework achieves an average F1 improvement of 21.52 points in gray-box settings and 14.46 points in black-box settings over strong baselines.
- A study of 15 widely used Code LLMs on four benchmarks reveals benchmark-specific leakage patterns that extend beyond prior overlap-based analyses.
This approach provides robust, threshold-independent leakage detection, addressing the limitations of existing methods that require access to training data or use non-generalizable thresholds.