The Riddle Riddle: Testing Flexible Reasoning in Large Language Models and Humans
A study introduces the "riddle riddle" paradigm to determine whether large language models (LLMs) rely on flexible reasoning or pattern matching, revealing that humans and LLMs fail in opposite directions. In experiments involving nine state-of-the-art LLMs and 100 human participants, LLMs performed significantly worse on riddle riddles than on genuine riddles, while humans showed the reverse trend.