Researchers present NeuroCogMap, a framework inspired by cognitive neuroscience that organizes the internal features of large language models (LLMs) into functional parcels linked to interpretable functions and capabilities.

  • The framework identifies a stable, semantically coherent organization of internal representations that is partly conserved across different models.
  • Major LLM failures such as hallucination, bias, refusal failure, and sycophancy correspond to distinct disruptions in representational and behavioral-control systems.
  • NeuroCogMap improves the prediction of human cortical responses during naturalistic language comprehension, with the strongest correspondence found in higher-order association cortex.
  • Internal signatures expose latent strategies that guide refinements of classical models of human decision-making.

These findings establish NeuroCogMap as a system-level framework for mapping functional organization in artificial systems and relating this organization to human cortical function and cognitive behavior.