Qwen3-tts.cpp and Compose Desktop GUI for Local TTS

A developer has released an optimized C++ implementation of Qwen3-TTS, achieving approximately 5x realtime speed on an RTX 5080, alongside a cross-platform desktop GUI built with Kotlin Compose Multiplatform. The project provides GGML-based inference that supports both CPU and CUDA execution on Windows and Linux.

Performance is reported as 15x faster than the Python reference implementation.
Supports 0.6B and 1.7B model sizes, including base models for voice cloning.
Features custom voice and voice design capabilities with instruction support.
Allows saving, mixing, and merging speaker embeddings.
Includes streaming output with semi-accurate text highlighting.
Provides download options for pre-converted GGUF models from Hugging Face.

This release enables users to run Qwen3-TTS locally with significantly improved speed and a user-friendly interface, facilitating voice cloning and synthesis without relying on the original Python environment.