Triadic Werewolf: A Jester Role for Multi-Hop Theory of Mind in LLMs
Researchers extended the Werewolf game with a Jester role to create a triadic social-deduction environment that requires reasoning across three opposing utility functions, challenging large language models' theory-of-mind capabilities. Evaluations on GPT-4.1, DeepSeek-V3.1, and Llama-3.3-70B revealed that while the Jester won 60-70% of games, GPT-4.1 wolves frequently voted the Jester out on day 1 in 60-70% of cases, a self-defeating action driven by language priors.