Nemotron-3-Super-120B-A12B achieves perfect needle retrieval to 504K tokens on 4×3090
A user tested NVIDIA's Nemotron-3-Super-120B-A12B model, which combines hybrid Mamba and MoE architectures, achieving exact recall in needle-in-the-haystack tests up to 504,482 tokens. The model was run fully on GPU across four RTX 3090s using the i1-Q4_K_S quantization, demonstrating that its Mamba layers maintain a constant-size recurrent state rather than a growing KV cache.