GeneBench-Pro is a benchmark designed to evaluate models on complex genomic reasoning tasks, featuring ten detailed case studies that showcase representative questions and supporting materials. Each case study provides the original prompt, datasets, and context necessary to assess model performance on specific biological challenges.

  • Estimate clinical utility of synthetic TXR1-directed inhibitors using long-read and pharmacogenomic evidence.
  • Distinguish transcript-specific lncRNA dependencies from nearby-locus effects by controlling for local DNA-perturbation and GC toxicity.
  • Perform cis multivariable Mendelian randomization to estimate direct disease effects while handling linkage disequilibrium and pleiotropy.
  • Calculate ancestry-specific carrier frequencies and residual risks using pseudogene-aware calls and founder-haplotype data.
  • Determine genotype effects on monocyte expression by correcting for ambient RNA and technical contamination in single-cell data.
  • Assess clinical associations of nested structural subhaplotypes within inversion-like loci, separating dosage calibration from expression support.
  • Quantify Hi-C loop-strength differences by masking low-mappability contacts and structural variant artifacts.
  • Map quantitative-trait loci in recombinant populations by reconstructing founder ancestry from biallelic marker data.
  • Infer parent-specific ancestry proportions and admixture timing from phased local-ancestry tracts after repairing reciprocal artifacts.
  • Identify haploid loci under positive selection using ancient allele-frequency time series while accounting for sequencing errors and drift.

These case studies highlight the necessity for models to handle nuanced biological confounders, such as linkage disequilibrium, ambient RNA, and structural variant artifacts, to produce defensible clinical and research conclusions.