This project enables voice chat with the Gemma 4 31B model through a 3D avatar that listens, speaks, and displays dynamic facial expressions and hand gestures. The system exposes function tools like set_mood, make_hand_gesture, and make_facial_expression to the LLM, allowing it to autonomously decide the avatar's reactions.
- The stack uses open models including silero VAD, parakeet for STT, Qwen3-TTS, and Gemma 4 31B served by Cerebras.
- Communication occurs via raw PCM over a plain WebSocket connection.
- Lip-syncing and avatar rendering are handled by met4citizen's TalkingHead and HeadAudio projects.
This setup demonstrates how to integrate multiple open-source components for real-time, interactive multimodal AI experiences.