The Complexity Ceiling Benchmark: A Multi-Domain Evaluation of Sequential Reasoning Under Depth Scaling
The Complexity Ceiling Benchmark (CCB) evaluates how language model reasoning decays as the required sequential steps increase, fixing semantic content while varying task depth from 5 to 50. The study reveals consistent geometric per-step decay across three distinct regimes: grounded spatial state-tracking, abstract symbolic pointer manipulation, and transitive relational inference.