SrDetection: A Self-Referential Framework for Data Leakage Detection in Code LLMs

The authors introduce SrDetection, a unified framework for detecting data leakage in code large language models that operates in both gray-box and black-box settings. The method generates semantically equivalent variants of benchmark samples to identify cases where the original data is disproportionately easier for the model due to pre-training exposure.

SrDetection contrasts model behavior on original samples against generated variants to flag leakage without relying on proprietary training corpora or brittle heuristics.
The framework achieves an average F1 improvement of 21.52 points in gray-box settings and 14.46 points in black-box settings over strong baselines.
A study of 15 widely used Code LLMs on four benchmarks reveals benchmark-specific leakage patterns that extend beyond prior overlap-based analyses.

This approach provides robust, threshold-independent leakage detection, addressing the limitations of existing methods that require access to training data or use non-generalizable thresholds.