Controls audio delivery method:
• "data_only" - Receive audio-data messages via data channel
• "audio_only" - Standard WebRTC audio track (no audio-data messages)
• "both" - Receive via both audio track and data channel
max_chunk_duration_ms (int)
10-1000, default(100)
Maximum duration of each audio chunk in milliseconds. Rounds up to nearest 10ms (e.g., 95ms → 100ms)
add_wav_header (boolean)
false (default) / true
Whether to include a 44-byte WAV header with each audio chunk (only affects data channel audio)
2. Receiving Audio Data on the Client
Audio data is sent as RTVI messages on the WebRTC data channel when audio_routing is set to "data_only" or "both".
Filter incoming messages with:
label === "rtvi-ai"
type === "server-message"
data.type === "audio-data"
3. Audio Data Message Shape
Field descriptions:
Field
Type
Value/Description
label
string
Always "rtvi-ai" (RTVI protocol identifier)
type
string
Always "server-message" (RTVI message type)
data.type
string
Always "audio-data" (specific message subtype)
data.sample_rate
integer
Audio sample rate in Hz (e.g., 16000, 24000, 48000)
data.channels
integer
Number of audio channels (1=mono, 2=stereo)
data.audio
string
Base64-encoded audio data (raw PCM or WAV with header)
data.includes_wav_header
boolean
true if audio includes 44-byte WAV header, false for raw PCM
Notes:
Each message contains one audio chunk
Chunk duration is controlled by max_chunk_duration_ms (default: 100ms)
Actual duration may be slightly higher due to rounding to nearest 10ms
Audio data is 16-bit signed PCM encoded as base64
4. Audio Data Format and Decoding
Field
Value
Notes
Format
16-bit signed PCM (little-endian)
Standard PCM audio format
Encoding
Base64
For JSON transport over data channel
Sample Rate
Varies (typically 16000, 24000, or 48000)
Specified in sample_rate field
Channels
1 (mono) or 2 (stereo)
Specified in channels field
WAV Header (optional)
44 bytes
Included when add_wav_header: true in config
Chunk Size
Controlled by max_chunk_duration_ms
Default 100ms, rounds up to nearest 10ms
Decoding steps:
Base64 decode the audio field to get raw bytes
If includes_wav_header is true:
First 44 bytes are WAV header (can save directly as .wav)
Remaining bytes are PCM audio data
If includes_wav_header is false:
All bytes are raw 16-bit signed PCM
Use sample_rate and channels fields to configure audio context
5. Example code to receive and decode audio data
6. Usage Notes
Audio data via the data channel is useful when you need:
Custom audio processing before playback (e.g., filters, effects, normalization)
Custom buffering strategies for unstable network conditions
Audio recording/saving as WAV files (with add_wav_header: true)
Non-browser clients that don't support WebRTC audio tracks
For standard browser-based audio playback, the default audio_routing: "audio_only" (standard WebRTC audio track) is recommended as it provides the lowest latency and is handled automatically by the browser.
Agent Instructions: Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.
Perform an HTTP GET request on the current page URL with the ask query parameter:
The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.
Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.