The authors introduce SrDetection, a unified framework for detecting data leakage in code large language models that operates in both gray-box and black-box settings. The method generates semantically equivalent variants of benchmark samples to identify cases where the original data is disproportionately easier for the model due to pre-training exposure.

  • SrDetection contrasts model behavior on original samples against generated variants to flag leakage without relying on proprietary training corpora or brittle heuristics.
  • The framework achieves an average F1 improvement of 21.52 points in gray-box settings and 14.46 points in black-box settings over strong baselines.
  • A study of 15 widely used Code LLMs on four benchmarks reveals benchmark-specific leakage patterns that extend beyond prior overlap-based analyses.

This approach provides robust, threshold-independent leakage detection, addressing the limitations of existing methods that require access to training data or use non-generalizable thresholds.