When Is a Draft Accepted? A Theory of Acceptance in Speculative Decoding

This article develops a theory for speculative decoding regimes that use greedy decoding, relaxed acceptance rules, or tree-based candidate sets, rather than the stochastic distribution-preserving settings studied in existing literature. The authors characterize rejection regions as lower level sets of the target distribution to derive exact KL divergence requirements and sharp margin-based bounds for various acceptance criteria.

Characterizes exact certificates and margin-based bounds for strict greedy decoding, additive and multiplicative relaxed acceptance, top-(m) relaxed criteria, and entropy-thresholded acceptance.
Extends the framework to greedy tree decoding, deriving exact and margin-only certificates for when the target greedy token remains covered by the drafter's top-(m) candidates.
Evaluates these certificates on Qwen3 models, showing that relaxed and tree-based criteria substantially enlarge the region of certified acceptance, particularly during steps with low target model distribution margin.

These results complement existing distribution-preserving analyses by characterizing the deterministic local acceptance events common in practical inference systems.