NeuroCogMap reveals cognitive organization of large language models

Researchers present NeuroCogMap, a framework inspired by cognitive neuroscience that organizes the internal features of large language models (LLMs) into functional parcels linked to interpretable functions and capabilities.

The framework identifies a stable, semantically coherent organization of internal representations that is partly conserved across different models.
Major LLM failures such as hallucination, bias, refusal failure, and sycophancy correspond to distinct disruptions in representational and behavioral-control systems.
NeuroCogMap improves the prediction of human cortical responses during naturalistic language comprehension, with the strongest correspondence found in higher-order association cortex.
Internal signatures expose latent strategies that guide refinements of classical models of human decision-making.

These findings establish NeuroCogMap as a system-level framework for mapping functional organization in artificial systems and relating this organization to human cortical function and cognitive behavior.