> For the complete documentation index, see [llms.txt](https://docs.convai.com/api-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.convai.com/api-docs/zh/cha-jian-yu-ji-cheng/web-plugins/convai-web-sdk.md).

# Convai Web SDK

## 简介

该 **Convai Web SDK（`@convai/web-sdk`)** 将全新的 Convai 后端带到浏览器中，使现代 Web 体验中的快速、自然、免手持 AI 交互成为可能。\
该 SDK 面向生产级应用、沉浸式网站和交互式世界而构建，能够处理实时音频、文本、可选视频、角色动作和情绪信号——为开发者提供直接在 Web 上创建响应迅速、智能化 AI 角色所需的工具。

借助内置语音采集、语音检测、即用型聊天组件以及完整的自定义 UI 支持，Convai 可轻松将逼真的助手、陪伴角色和 NPC 集成到任何 Web 环境中。

{% embed url="<https://youtu.be/fK8R0SzuvNI>" %}
使用 Convai Web SDK、Three.JS 和 React 构建基于浏览器的对话式 AI 头像
{% endembed %}

## 新增内容

Web SDK 引入了由 Convai 最新后端驱动的精简、高性能交互管线：

* **免手持语音对话**\
  无需按住说话即可进行自然、连续的对话。
* **低延迟响应**\
  更快的流式回复，实现流畅的实时交互。
* **情绪与动作信号**\
  角色可以表达情绪并触发上下文相关行为。
* **可选视频和屏幕共享**\
  当你的体验需要时，可添加更丰富的视觉上下文。
* **预构建的 ConvaiWidget**\
  一个完善的、完整的音频、文本和视频聊天界面。
* **自定义 UI 和完整控制 API**\
  使用暴露的 hooks 和状态构建你自己的界面与行为逻辑。
* **现代 Web 集成**\
  专为基于 Web 的框架和工具链设计。
* 长期记忆\
  通过 MemoryManager API 在会话之间持久保存的每个用户记忆。
* 二进制文件传输\
  通过 uploadFile() 直接向角色发送图片和文件。

### 核心概念

从高层来看，该 SDK 由几个核心部分组成：

1. **ConvaiClient**\
   大脑。管理连接、状态、消息、音频/视频/屏幕共享控制以及 blendshape 队列。
2. **ConvaiWidget**\
   一个完整的、预构建的文本 + 语音 + 可选视频/屏幕共享界面。
3. **AudioRenderer** **（音频播放至关重要）**\
   将机器人音轨连接到用户扬声器。
   * 自定义 UI 必需
   * 已内置于 `ConvaiWidget`
4. **BlendshapeQueue** **（面部动画的核心）**\
   管理面部 blendshape 数据的缓冲与基于时间的检索。
   * 提供与语音同步的 60fps blendshape 流
   * 支持 ARKit（61 个元素）和 MetaHuman（251 个元素）格式
   * 支持任何角色骨骼的可选自定义映射
5. **连接类型**\
   决定可实现的功能：
   * `“audio”` （默认）——仅音频对话
   * `“video”` ——音频 + 视频 + 屏幕共享
6. MemoryManager
   * 每个用户的长期记忆。列出、添加和删除与 endUserId 关联的记忆。
   * endUserId。\
     由 client.memoryManager 返回——如果未设置 endUserId，则为 null。

***

### 架构

```
┌─────────────────────────────────────────────────┐
│  ConvaiWidget（UI 层）                         │
│  ├─ 聊天界面                                   │
│  ├─ 语音模式                                   │
│  └─ 视频/屏幕共享 UI                           │
└─────────────────────────────────────────────────┘
                     ▼
┌─────────────────────────────────────────────────┐
│  ConvaiClient（核心逻辑）                     │
│  ├─ 连接管理                                   │
│  ├─ 消息处理                                   │
│  ├─ 状态管理                                   │
│  └─ 音频/视频控制                              │
│  └─ Blendshape 队列管理                       │
└─────────────────────────────────────────────────┘
                     ▼
┌─────────────────────────────────────────────────┐
│  WebRTC 房间（通信层）                        │
│  ├─ 实时音频/视频流传输                        │
│  ├─ Blendshape 数据流传输（60fps）            │
│  ├─ 音轨管理                                   │
│  └─ 网络通信                                   │
└─────────────────────────────────────────────────┘
                     ▼
┌─────────────────────────────────────────────────┐
│  AudioRenderer（播放至关重要）                │
│  ├─ 将音轨附加到 DOM                          │
│  ├─ 管理音频元素                               │
│  └─ 启用机器人语音播放                         │
└─────────────────────────────────────────────────┘
```

***

#### 包含内容

* **React SDK**
  * `useConvaiClient` 用于轻松管理客户端生命周期的 hook
  * `<ConvaiWidget />` 用于完整 UI
  * `<AudioRenderer />` + `AudioContext` 用于自定义 UI
  * 可访问 `blendshapeQueue` 用于面部动画
* **Vanilla SDK**
  * `ConvaiClient` 用于直接控制的类
  * `AudioRenderer` 用于播放的类
  * 可选 `createConvaiWidget()` 辅助函数
  * `BlendshapeQueue` 用于面部动画的 API
* **唇形同步与面部动画**
  * 60fps 实时 blendshape 流
  * 支持 ARKit（61）和 MetaHuman（251）格式
  * 基于名称的声明式映射系统
  * 辅助函数和预设配置
  * 可与 Three.js、Babylon.js、Unity WebGL 以及自定义引擎配合使用
* **视频与屏幕共享**
  * 当以下情况时支持摄像头和屏幕共享 `enableVideo: true`
  * 细粒度的视频和屏幕共享控制
* **以 TypeScript 为先**
  * 为配置、状态、消息和控制 API 提供完整类型定义<br>

{% hint style="success" %}

#### 性能优化

为了实现尽可能低的延迟，我们建议将你的 Core AI 设置配置为使用 `gemini-flash-2.5-beta` 模型。该模型针对速度进行了优化，非常适合响应时间至关重要的实时应用。
{% endhint %}

<figure><img src="/files/51e129d1bb93034d1655e06393bab046255c87ec" alt=""><figcaption></figcaption></figure>

## 结论

该 **Convai Web SDK（`@convai/web-sdk`)** 标志着将实时 AI 交互带入浏览器迈出了重要一步。借助语音、动作、情绪以及可选视频全部运行在最新 Convai 后端上，你可以在任何 Web 体验中构建快速、响应灵敏且高度交互的 AI 角色。

立即开始构建，将下一代 AI 驱动的交互带到开放的 Web。


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.convai.com/api-docs/zh/cha-jian-yu-ji-cheng/web-plugins/convai-web-sdk.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.