Researchers present NeuroCogMap, a framework inspired by cognitive neuroscience that organizes the internal features of large language models (LLMs) into functional parcels linked to interpretable functions and capabilities.
- The framework identifies a stable, semantically coherent organization of internal representations that is partly conserved across different models.
- Major LLM failures such as hallucination, bias, refusal failure, and sycophancy correspond to distinct disruptions in representational and behavioral-control systems.
- NeuroCogMap improves the prediction of human cortical responses during naturalistic language comprehension, with the strongest correspondence found in higher-order association cortex.
- Internal signatures expose latent strategies that guide refinements of classical models of human decision-making.
These findings establish NeuroCogMap as a system-level framework for mapping functional organization in artificial systems and relating this organization to human cortical function and cognitive behavior.