When Is a Draft Accepted? A Theory of Acceptance in Speculative Decoding
This article develops a theory for speculative decoding regimes that use greedy decoding, relaxed acceptance rules, or tree-based candidate sets, rather than the stochastic distribution-preserving settings studied in existing literature. The authors characterize rejection regions as lower level sets of the target distribution to derive exact KL divergence requirements and sharp margin-based bounds for various acceptance criteria.