CPU-only TTS benchmark: Kokoro 82M vs Supertonic 3 vs Inflect-Nano-v1

A CPU-only text-to-speech benchmark compares Kokoro-82M, Supertonic-3, and Inflect-Nano-v1 on an Intel Xeon with 4 cores and 15.6GB RAM. Kokoro delivers the most natural sound (MOS 4.44-4.45) despite slower speed, with ONNX version outperforming PyTorch in real-time factor while maintaining identical quality. Supertonic-5-step achieves a balanced result at 3.2x real-time and MOS 4.37, making it the practical choice for usability and quality.