Quantization Impact on MTP Draft Acceptance Rates
An analysis of speculative decoding using Gemma 4-31B-it models demonstrates that heavy quantization reduces the token acceptance rate because the main model becomes less consistent with the drafter. Testing across Q5_K_S, IQ4_XS, IQ3_M, and IQ2_M quantizations reveals how draft depth affects performance.