xAI has announced the beta release of Voice Agent Builder, a no-code platform designed to configure production-grade voice agents on Grok Voice in under two minutes. This tool allows operators and developers to deploy high-volume voice agents without building the underlying telephony or AI stack from scratch.
- The platform uses a speech-to-speech path tightly coupled with Grok Voice, avoiding the latency and cost of stitching separate speech-to-text, LLM, and text-to-speech APIs.
- Users can configure agents via plain-language prompts, attach knowledge bases in formats like Markdown or Excel, and connect tools such as Google Calendar, Linear, or custom APIs.
- Features include 80+ built-in voices, voice cloning from two minutes of audio, real-time notifications, call recording with transcription, and configurable guardrails.
- Pricing is simplified to an API rate of $0.05 per minute of audio plus $0.01 per minute for telephony on provisioned numbers, eliminating separate component fees.
The authors emphasize that the system is trained on real-world call conditions involving noise, accents, and interruptions, aiming to provide a transparent and simple pricing model compared to traditional multi-component voice stacks.