Step-3.7-Flash (198B-A11B vision MoE) on 4×3090 — fully-resident IQ3_XXS beats thespilled IQ4 by 2.4×, and MTP speculative decode silently breaks vision
A user demonstrates running StepFun's 198B-parameter Step-3.7-Flash model on a consumer 4×RTX 3090 setup, revealing critical performance trade-offs between quantization levels and multi-token prediction (MTP) with vision capabilities.