A study reveals that while large reasoning models (LRMs) and humans both spend more time on harder problems, they diverge significantly in how they allocate deliberation within specific items. When making errors, LRMs generate more tokens than when correct, whereas humans do the opposite, spending less time on trials they get wrong.

  • The research separates "registration" (tracking difficulty across items) from "allocation" (spending more on own failures vs. successes).
  • On a public matched human-LRM corpus, both groups reproduce cross-item alignment with difficulty but show opposite allocation patterns.
  • Every LRM tested showed a large wrong-vs-right effect (Cohen's d = 1.47-3.13 on H-ARC), while humans exhibited the opposite sign.
  • The dissociation holds under item fixed effects, replicates across datasets, and is absent in non-thinking baselines.
  • Human behavior is interpreted as engagement versus abandonment, while LRM behavior is driven by uncertainty leading to longer chains.

This divergence highlights that trace length captures a difficulty signal but misses the underlying control policy, suggesting current metrics may mask fundamental differences in how agents handle failure.