# Convai Web SDK

## Introduction

The **Convai Web SDK (`@convai/web-sdk`)** brings the new Convai backend to the browser, enabling fast, natural, hands-free AI interactions across modern web experiences.\
Built for production apps, immersive sites, and interactive worlds, the SDK handles real-time audio, text, optional video, character actions, and emotion signals — giving developers the tools to create responsive, intelligent AI characters directly on the web.

With built-in voice capture, speech detection, a ready-to-use chat widget, and full custom UI support, Convai makes it simple to integrate lifelike assistants, companions, and NPCs into any web environment.

{% embed url="<https://youtu.be/fK8R0SzuvNI>" %}
Build Browser-Based Conversational AI Avatars with the Convai Web SDK, Three.JS, and React
{% endembed %}

## Convai WebSDK Tutorial

{% embed url="<https://youtu.be/fK8R0SzuvNI>" %}
Integrate Conversational AI, Real-time LipSync and Facial Animations For Browser Based 3D Avatars Using Convai WebSDK&#x20;
{% endembed %}

## What’s New

The Web SDK introduces a streamlined, high-performance interaction pipeline powered by Convai’s newest backend:

* **Hands-free voice conversations**\
  Natural, continuous dialogue without push-to-talk.
* **Low-latency responses**\
  Faster streaming replies for smooth, real-time interaction.
* **Emotion and action signalling**\
  Characters can express mood and trigger contextual behaviours.
* **Optional video and screen sharing**\
  Add richer visual context when your experience requires it.
* **Pre-built ConvaiWidget**\
  A polished, complete UI for audio, text, and video chat.
* **Custom UI and full control APIs**\
  Build your own interface and behaviour logic with exposed hooks and state.
* **Modern web integration**\
  Designed for Web-based frameworks and tooling.

### Core Concepts

At a high level, the SDK is organised into a few core pieces:

1. **ConvaiClient**\
   The brain. Manages connection, state, messages, audio/video/screen-share control, and blendshape queue.
2. **ConvaiWidget**\
   A complete, prebuilt interface for text + voice + optional video/screen share.
3. **AudioRenderer** **(Critical for audio playback)**\
   Attaches the bot's audio tracks to the user's speakers.
   * Required for custom UIs
   * Already built in to `ConvaiWidget`
4. **BlendshapeQueue** **(Essential for facial animation)**\
   Manages buffering and time-based retrieval of facial blendshape data.
   * Provides 60fps blendshape streams synchronized with speech
   * Supports ARKit (61 elements) and MetaHuman (251 elements) formats
   * Optional custom mapping for any character rig
5. **Connection Type**\
   Determines what's possible:
   * `"audio"` (default) – audio-only conversations
   * `"video"` – audio + video + screen share

***

### Architecture

```
┌─────────────────────────────────────────────────┐
│  ConvaiWidget (UI Layer)                        │
│  ├─ Chat Interface                              │
│  ├─ Voice Mode                                  │
│  └─ Video/Screen Share UI                       │
└─────────────────────────────────────────────────┘
                     ▼
┌─────────────────────────────────────────────────┐
│  ConvaiClient (Core Logic)                      │
│  ├─ Connection Management                       │
│  ├─ Message Handling                            │
│  ├─ State Management                            │
│  └─ Audio/Video Controls                        │
│  └─ Blendshape Queue Management                 │
└─────────────────────────────────────────────────┘
                     ▼
┌─────────────────────────────────────────────────┐
│  WebRTC Room (Communication Layer)              │
│  ├─ Real-time Audio/Video Streaming             │
│  ├─ Blendshape Data Streaming (60fps)           │
│  ├─ Track Management                            │
│  └─ Network Communication                       │
└─────────────────────────────────────────────────┘
                     ▼
┌─────────────────────────────────────────────────┐
│  AudioRenderer (Critical for Playback)         │
│  ├─ Attaches audio tracks to DOM               │
│  ├─ Manages audio elements                     │
│  └─ Enables bot voice playback                 │
└─────────────────────────────────────────────────┘
```

***

#### What's Included

* **React SDK**
  * `useConvaiClient` hook for easy client lifecycle
  * `<ConvaiWidget />` for full UI
  * `<AudioRenderer />` + `AudioContext` for custom UIs
  * Access to `blendshapeQueue` for facial animation
* **Vanilla SDK**
  * `ConvaiClient` class for direct control
  * `AudioRenderer` class for playback
  * Optional `createConvaiWidget()` helper
  * `BlendshapeQueue` API for facial animation
* **Lipsync & Facial Animation**
  * Real-time blendshape streaming at 60fps
  * Support for ARKit (61) and MetaHuman (251) formats
  * Declarative name-based mapping system
  * Helper functions and preset configurations
  * Works with Three.js, Babylon.js, Unity WebGL, and custom engines
* **Video & Screen Share**
  * Camera and screen share support when `enableVideo: true`
  * Fine-grained video and screen share controls
* **TypeScript-first**
  * Full type definitions for configs, state, messages, and control APIs<br>

{% hint style="success" %}

## Performance Optimization&#x20;

To achieve the lowest possible latency, we recommend configuring your Core AI settings to use the `gemini-flash-2.5-beta` model. This model is optimized for speed and is ideal for real-time applications where response time is critical.
{% endhint %}

<figure><img src="https://413558230-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEtUJA212Zc1S9ACc8T4l%2Fuploads%2F6vfKd1jEv3lr58zyqPvc%2FCore%20AI.png?alt=media&#x26;token=b9e4914e-e811-49b2-9856-02a3921a2967" alt=""><figcaption></figcaption></figure>

## Conclusion

The **Convai Web SDK (`@convai/web-sdk`)** marks a major step forward in bringing real-time AI interaction to the browser. With speech, actions, emotions, and optional video all running on the latest Convai backend, you can build fast, responsive, and deeply interactive AI characters across any web experience.

Start building today and bring the next generation of AI-powered interaction to the open web.
