xAI's audio models are now live on AI Gateway. Realtime voice, text to speech, and speech to text are all available through the AI SDK with the same routing, observability, and spend controls as your other models.
These capabilities are available on the AI SDK 7 release.
A voice agent has two pieces: a server route that mints a short-lived token, so your API key never reaches the client, and a browser component that connects with it.
Add the token route: this example sets model to
Then connect from the browser. The
Generate spoken audio from text with
Transcribe recordings into text with
You can also try the xAI audio models directly in the AI Gateway playground. Open the models list and click into any of the models to use them directly in the browser. The
Read more
Continue reading...
These capabilities are available on the AI SDK 7 release.
Available models
Capability | Models |
|---|---|
Realtime voice | xai/grok-voice-think-fast-1.0 |
Text to speech | xai/grok-tts |
Speech to text | xai/grok-stt |
Realtime
A voice agent has two pieces: a server route that mints a short-lived token, so your API key never reaches the client, and a browser component that connects with it.
Add the token route: this example sets model to
xai/grok-voice-think-fast-1.0:Then connect from the browser. The
useRealtimehook from @ai-sdk/react fetches that route and manages the WebSocket connection, microphone capture, and audio playback:Text to speech
Generate spoken audio from text with
generateSpeech. Pass a voice and an output format, then write the result to a file with xai/grok-tts:Speech to text
Transcribe recordings into text with
transcribe. This example uses xai/grok-stt:Playground
You can also try the xAI audio models directly in the AI Gateway playground. Open the models list and click into any of the models to use them directly in the browser. The
xai/grok-voice-think-fast-1.0 playground here allows you to talk to the agent and see responses instantly:More information
Read more
Continue reading...