Study finds LLM research ideas are systematically narrower than human ones

A new evaluation framework measures the divergence between research ideas generated by large language models and those produced by human researchers. The study reverse-engineers prior works from high-quality papers to prompt LLMs, then profiles the outputs using a two-axis research-taste taxonomy based on opportunity patterns and research paradigms.

LLM-generated ideas are disproportionately concentrated around bridge-like opportunities and synthesis methods.
Human paper references spread more broadly across ways of framing gaps and constructing contributions.
The distributional gap is consistent across different LLMs, indicating a systematic shift relative to human taste.

The results suggest that while strong LLMs can produce reasonable ideas, their range remains narrower than that of human researchers.