ReproRepo: Scalable Reproducibility Audits with GitHub Issues
ReproRepo introduces a scalable framework using GitHub issues to evaluate ML paper reproducibility. It shows that LLM agents like Codex with GPT-5.5 identify at least one human-reported blocker in 90% of 1,149 ML papers, highlighting their ability to detect visible failures and semantic issues, though exact localization remains limited.