Bro77XP Releases Beginner-Friendly Local AI VTuber with Zero-Shot Voice Cloning

Bro77XP has released a 100% local, free AI VTuber project designed for beginners and non-programmers. The system utilizes Whisper for real-time English speech recognition, Ollama with the llama3.2 model for LLM inference, and Chatterbox TTS for text-to-speech generation. It features instant zero-shot voice cloning and operates in a continuous listening loop that automatically detects silence to record only when speech is present. The software integrates with VTube Studio via its API to control mouth expressions and trigger emotion animations based on the generated responses. While initially developed on an AMD GPU, the code primarily supports CPU users, allowing operation without specific NVIDIA or AMD hardware. Setup requires Python 3.10.11 and involves creating a virtual environment to install core dependencies like openai-whisper, pyaudio, and websocket-client.