arxiv arXiv cs.LG · 8d ago · research

Ternary Mamba: Efficient QAT of SSMs from Pretrained Checkpoints

from English

Ternary Mamba achieves 3.61x compression of Mamba-2 from 2,687 to 744 MB using grouped quantization-aware training with knowledge distillation. It reaches 48.1% zero-shot accuracy on 7 tasks in 102M tokens, matching Bi-Mamba within 0.9 percentage points, while avoiding costly from-scratch training.

Importance 2/3 New harness with differentiators arXiv cs.LG Mistral AI Google DeepMind OpenAI Evaluation & benchmarks Inference efficiency Training methods

Read original