A new evaluation framework measures the divergence between research ideas generated by large language models and those produced by human researchers. The study reverse-engineers prior works from high-quality papers to prompt LLMs, then profiles the outputs using a two-axis research-taste taxonomy based on opportunity patterns and research paradigms.
- LLM-generated ideas are disproportionately concentrated around bridge-like opportunities and synthesis methods.
- Human paper references spread more broadly across ways of framing gaps and constructing contributions.
- The distributional gap is consistent across different LLMs, indicating a systematic shift relative to human taste.
The results suggest that while strong LLMs can produce reasonable ideas, their range remains narrower than that of human researchers.