A developer has released an optimized C++ implementation of Qwen3-TTS, achieving approximately 5x realtime speed on an RTX 5080, alongside a cross-platform desktop GUI built with Kotlin Compose Multiplatform. The project provides GGML-based inference that supports both CPU and CUDA execution on Windows and Linux.
- Performance is reported as 15x faster than the Python reference implementation.
- Supports 0.6B and 1.7B model sizes, including base models for voice cloning.
- Features custom voice and voice design capabilities with instruction support.
- Allows saving, mixing, and merging speaker embeddings.
- Includes streaming output with semi-accurate text highlighting.
- Provides download options for pre-converted GGUF models from Hugging Face.
This release enables users to run Qwen3-TTS locally with significantly improved speed and a user-friendly interface, facilitating voice cloning and synthesis without relying on the original Python environment.