ReproRepo introduces a scalable framework using GitHub issues to evaluate ML paper reproducibility. It shows that LLM agents like Codex with GPT-5.5 identify at least one blocker in 90% of paper-repository pairs without executing code, though exact localization remains challenging.
ReproRepo: Scaling Reproducibility Audits with GitHub Issues
from English