> For the complete documentation index, see [llms.txt](https://docs.convai.com/api-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.convai.com/api-docs/zh/api-can-kao/core-api-reference/live-apis-beta/audio-data-via-data-channel.md). # 音频数据（通过数据通道） *** ### 1. 在 Connect API 请求中启用音频数据要通过 WebRTC 数据通道而不是标准音频轨道接收音频，请在调用 `audio_config` 时进行配置。 **注意：** 通过数据通道传输的音频数据仅支持 LiveKit 传输（默认）。 HTTP 请求： ```http POST https://live.convai.com/connect X-Api-Key: Content-Type: application/json ``` 请求体： ```json { "character_id": "", "audio_config": { "output": { "audio_routing": "data_only", "max_chunk_duration_ms": 100, "add_wav_header": false } } } ``` **配置字段：** | 字段 | 范围/选项 | 描述 | | ---------------------------- | ------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | audio\_routing（字符串） |

"data\_only"

"audio\_only" （默认）

"both"

控制音频传输方式：
• "data\_only" - 通过数据通道接收音频数据消息
• "audio\_only" - 标准 WebRTC 音频轨道（无音频数据消息）
• "both" - 同时通过音频轨道和数据通道接收

| | max\_chunk\_duration\_ms（整数） | 10-1000，默认（100） | 每个音频块的最大持续时间（毫秒）。会向上取整到最接近的 10 毫秒（例如，95ms → 100ms） | | add\_wav\_header（布尔值） | false（默认）/ true | 是否在每个音频块中包含 44 字节的 WAV 头（仅影响数据通道音频） | *** ### 2. 在客户端接收音频数据当 `audio_routing` 被设置为 `"data_only"` 或 `"both"`. 过滤传入消息： * `label === "rtvi-ai"` * `type === "server-message"` * `data.type === "audio-data"` *** ### 3. 音频数据信息结构 ```json { "label": "rtvi-ai", "type": "server-message", "data": { "type": "audio-data", "sample_rate": 24000, "channels": 1, "audio": "AAEAAg==...", "includes_wav_header": false } } ``` **字段说明：** | 字段 | 类型 | 值/描述 | | -------------------------- | --- | --------------------------------------------- | | `label` | 字符串 | Always `"rtvi-ai"` （RTVI 协议标识符） | | `type` | 字符串 | Always `"server-message"` （RTVI 消息类型） | | `data.type` | 字符串 | Always `"audio-data"` （特定消息子类型） | | `data.sample_rate` | 整数 | 音频采样率（Hz）（例如：16000、24000、48000） | | `data.channels` | 整数 | 音频声道数（1=单声道，2=立体声） | | `data.audio` | 字符串 | Base64 编码的音频数据（原始 PCM 或带头部的 WAV） | | `data.includes_wav_header` | 布尔值 | `true` 如果音频包含 44 字节的 WAV 头， `false` 适用于原始 PCM | | | | | **注意：** * 每条消息包含一个音频块 * 块持续时间由 `max_chunk_duration_ms` 控制（默认：100ms） * 由于向上取整到最接近的 10ms，实际持续时间可能会略高 * 音频数据是以 base64 编码的 16 位有符号 PCM *** #### 4. 音频数据格式与解码 | 字段 | 值 | 说明 | | --------- | ----------------------------- | ------------------------------------ | | 格式 | 16 位有符号 PCM（小端序） | 标准 PCM 音频格式 | | 编码 | Base64 | 用于数据通道上的 JSON 传输 | | 采样率 | 不同（通常为 16000、24000 或 48000） | 在以下字段中指定 `sample_rate` 字段 | | 声道 | 1（单声道）或 2（立体声） | 在以下字段中指定 `channels` 字段 | | WAV 头（可选） | 44 字节 | 当以下情况包含时 `add_wav_header: true` 在配置中 | | 块大小 | 由以下控制 `max_chunk_duration_ms` | 默认 100ms，向上取整到最接近的 10ms | **解码步骤：** 1. 将 `audio` 字段进行 Base64 解码以获取原始字节 2. 如果 `includes_wav_header` 为 `true`: * 前 44 个字节是 WAV 头（可直接保存为 `.wav`) * 剩余字节是 PCM 音频数据 3. 如果 `includes_wav_header` 为 `false`: * 所有字节都是原始 16 位有符号 PCM * 使用 `sample_rate` 和 `channels` 用于配置音频上下文的字段 #### 5. 接收并解码音频数据的示例代码 ```javascript import { Room, RoomEvent } from "livekit-client"; const CORE_SERVICE_BASE_URL = "https://live.convai.com"; const API_KEY = ""; const CHARACTER_ID = ""; async function startVoiceSessionWithAudioData() { // 1) 调用启用 audio_config 的 /connect const connectResp = await fetch(`${CORE_SERVICE_BASE_URL}/connect`, { method: "POST", headers: { "x-api-key": API_KEY, "content-type": "application/json", }, body: JSON.stringify({ character_id: CHARACTER_ID, audio_config: { output: { audio_routing: "data_only", max_chunk_duration_ms: 100, add_wav_header: false, }, }, }), }); if (!connectResp.ok) { throw new Error(`/connect failed: ${connectResp.status}`); } const connectData = await connectResp.json(); const { room_url, token, session_id } = connectData; // 2) 使用 room_url + token 加入 WebRTC 房间 const room = new Room(); // 3) 设置音频上下文用于播放 const audioContext = new AudioContext({ sampleRate: 48000 }); const audioQueue = []; room.on(RoomEvent.Connected, () => { console.log("已连接到房间"); }); room.on(RoomEvent.DataReceived, (payload) => { // LiveKit 数据载荷是字节 -> 解码 -> 解析 JSON let msg; try { const text = new TextDecoder().decode(payload); msg = JSON.parse(text); } catch { return; } // 4) 仅过滤 audio-data 消息 if (msg?.label !== "rtvi-ai") return; if (msg?.type !== "server-message") return; if (msg?.data?.type !== "audio-data") return; const audioData = msg.data; console.log("收到音频块：", { sample_rate: audioData.sample_rate, channels: audioData.channels, includes_wav_header: audioData.includes_wav_header, audio_length: audioData.audio.length, }); // 5) 解码 base64 音频数据 const base64Audio = audioData.audio; const binaryString = atob(base64Audio); const bytes = new Uint8Array(binaryString.length); for (let i = 0; i < binaryString.length; i++) { bytes[i] = binaryString.charCodeAt(i); } // 6) 转换为 AudioBuffer 并加入播放队列 let pcmData = bytes; if (audioData.includes_wav_header) { // 跳过 WAV 头（前 44 字节） pcmData = bytes.slice(44); } // 将字节转换为 16 位 PCM 采样 const samples = new Int16Array( pcmData.buffer, pcmData.byteOffset, pcmData.byteLength / 2, ); // 转换为用于 Web Audio API 的 Float32Array const floatSamples = new Float32Array(samples.length); for (let i = 0; i < samples.length; i++) { floatSamples[i] = samples[i] / 32768.0; // 归一化到 [-1, 1] } // 创建 AudioBuffer const audioBuffer = audioContext.createBuffer( audioData.channels, floatSamples.length / audioData.channels, audioData.sample_rate, ); audioBuffer.copyToChannel(floatSamples, 0); // 加入播放队列 audioQueue.push(audioBuffer); playNextChunk(); // 请自行实现播放队列管理 }); room.on(RoomEvent.Disconnected, () => { console.log("已断开房间连接"); }); await room.connect(room_url, token); // 返回对象，以便调用方稍后断开连接/清理 return { room, session_id, audioContext }; } // 示例用法： startVoiceSessionWithAudioData().catch(console.error); ``` *** #### 6. 使用说明当你需要以下功能时，通过数据通道传输音频数据会很有用： * **自定义音频处理** 在播放前（例如：滤波器、效果、归一化） * **自定义缓冲策略** 用于不稳定的网络条件 * **音频录制/保存** 为 WAV 文件（带 `add_wav_header: true`) * **不支持浏览器的客户端** 那些不支持 WebRTC 音频轨道对于标准的基于浏览器的音频播放，建议使用默认的 `audio_routing: "audio_only"` （标准 WebRTC 音频轨道），因为它具有最低延迟，并且由浏览器自动处理。 *** ## 代理说明：查询本文档如果你需要本页未直接提供的更多信息，可以通过提问动态查询文档。在当前页面 URL 上执行 HTTP GET 请求，并使用 `ask` 查询参数： ``` GET https://docs.convai.com/api-docs/api-reference/core-api-reference/live-apis-beta/audio_data.md?ask= ``` 问题应具体、自包含，并使用自然语言编写。响应将包含该问题的直接答案，以及文档中的相关摘录和来源。当答案未在当前页面中明确给出、你需要澄清或更多上下文，或者你想检索相关文档章节时，请使用此机制。 --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://docs.convai.com/api-docs/zh/api-can-kao/core-api-reference/live-apis-beta/audio-data-via-data-channel.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.