JetBrains has open-sourced the Mellum2 models, a series of 12B-2.5A LLMs trained from scratch to target fast inference on H100/H200 hardware as well as local deployments.
The models are available as GGUF files on Ollama and Hugging Face, with a full technical report published on arXiv.
Benchmarks indicate that Mellum2 performs comparably to other small language models while providing significantly higher throughput under concurrent load.