Backtrack Sampler and Verifier Drastically Improve Tiny Model Coding Performance
A new backtrack sampler combined with a verifier model significantly enhances the coding performance of tiny 0.5B parameter models, potentially making them competitive with larger 2-4B class models without weight changes. The approach theoretically addresses hallucination issues in large models by correcting errors during generation through re-sampling. However, this method incurs a 5-30% decode speed penalty due to the need for backward passes and requires training a verifier model of similar size to the original. This requirement doubles VRAM usage and increases compute demands by 1.5 to 3 times compared to standard inference. Despite these costs, the verifier generalizes across models of equal or lower weight classes if trained on diverse data distributions. Training the verifier is highly efficient, requiring only approximately 0.01% of the token size used for full pre-training.