Ternary Mamba achieves 3.61x compression of Mamba-2 from 2,687 to 744 MB using grouped quantization-aware training with knowledge distillation. It reaches 48.1% zero-shot accuracy on 7 tasks in 102M tokens, matching Bi-Mamba within 0.9 percentage points, while avoiding costly from-scratch training.
Ternary Mamba: Efficient QAT of SSMs from Pretrained Checkpoints
from English