WebGL
Configure and validate the Convai Unity SDK for WebGL — covers HTTPS requirements, browser audio gesture handling, Vision canvas capture, and the known lip-sync drift defect.
The Convai Unity SDK supports voice conversation, lip sync, actions, dynamic context, emotion, Vision, and long-term memory on WebGL. The browser introduces three constraints that do not exist on native platforms: a mandatory HTTPS origin for microphone access, a user-gesture requirement before audio playback or microphone capture can begin, and a canvas-based Vision capture path instead of Unity RenderTexture. All three are covered on this page.
Feature support
Voice conversation
✅ Full
Lip sync
✅ Full (see known issue in Troubleshooting)
Actions
✅ Full
Dynamic Context
✅ Full
Emotion
✅ Full
Vision
✅ Canvas capture (browser game view)
Long-Term Memory
✅ Full
Spatial audio
❌ Not supported
Screen share
❌ Not supported
Microphone device selection
❌ Not available — browser controls device selection
Unity AudioSource playback
❌ Not supported — browser audio path only
Microphone test / pre-check
❌ Not supported
Browser requirements
HTTPS is required for microphone access. Browsers block microphone capture on non-secure origins. Serve your WebGL build over HTTPS. The only exception is localhost, which browsers treat as a secure origin. Deploying to http:// causes the browser to silently deny microphone permission — no error is shown to the user and voice conversation will not start.
iframe embedding: When embedding your WebGL build in an iframe, the parent page must include allow="microphone" on the <iframe> element. Without it, the browser blocks microphone access regardless of HTTPS status.
<iframe src="https://your-host.com/build/" allow="microphone" width="960" height="600"></iframe>Microphone device selection: The browser controls all microphone device selection. When conversation starts, the browser displays its own permission prompt and allows the user to select a microphone device. The SDK returns an empty device list on WebGL — the Settings Panel microphone dropdown will show no entries. This is expected behavior, not an error. The microphone test functionality available on native platforms is not supported on WebGL.
Example: LMS iframe embed
A manufacturing company embeds a safety compliance drill in their Learning Management System. The LMS iframe loads the WebGL build from https://sim.company.com/safety-drill. The Convai character plays a site safety officer who tests operator responses to in-scene hazard scenarios.
Setup:
The LMS page includes the
allow="microphone"attribute on the<iframe>element:The WebGL build is served over HTTPS.
An explicit Begin Drill button is placed on the scene load screen, wired to
ConvaiManager.EnableAudioAndStartListening().
Outcome: Operators click Begin Drill, grant microphone permission in the browser prompt, and begin the verbal compliance assessment.
Audio gesture handling
Browsers require a user interaction before allowing audio playback or microphone capture. The SDK handles this in two ways:
Automatic gesture detection: After connecting, the SDK listens for the first click or touch that lands outside a UI element and calls EnableAudioAndStartListening() on ConvaiManager automatically. This works for scenes where users interact directly with the 3D view.
Explicit Start button (recommended for UI-heavy scenes): For scenes with full-screen overlays, loading screens, or any UI that covers the view on load, automatic detection may not fire reliably. Add an explicit Start button and wire it to ConvaiManager.EnableAudioAndStartListening().
The automatic gesture detection and the explicit Start button are not mutually exclusive — both can be active at the same time. The Start button approach is simply more reliable when UI covers the scene on load.
Add a Button component to a UI GameObject.
In the On Click () list, click +.
Drag your
ConvaiManagerGameObject into the object field.In the function dropdown, select ConvaiManager → EnableAudioAndStartListening.
Example: Corporate onboarding training
An enterprise L&D team hosts a company policy training simulation on their corporate intranet at https://training.company.internal/onboarding. A Convai character plays an HR representative who guides new hires through policy scenarios.
Setup:
Build is served over HTTPS from the corporate intranet server.
A Start Conversation button is placed on a welcome screen using the Inspector approach above.
ConvaiManager.EnableAudioAndStartListening()is wired to the button's On Click () event.Standard SDK configuration — no additional WebGL-specific steps.
Outcome: Employees click Start Conversation on the welcome screen. The browser displays a microphone permission prompt. After granting permission, the Convai character begins the onboarding dialogue. The welcome screen hides automatically after the button is clicked.
Vision on WebGL
On WebGL, Vision captures the Unity game view as rendered in the browser canvas. The SDK uses an internal WebGLCanvasVideoSource to publish the browser canvas as the vision frame source — standard CameraVisionFrameSource components are not used on this platform.
Key differences from native Vision:
Frame source
CameraVisionFrameSource or WebcamVisionFrameSource
Browser canvas
Max frame rate
Configurable
15 fps (fixed)
Webcam access
Supported
Not available via SDK
RenderTexture publishing
Supported
Not used
WebGL Vision captures what the player sees in the browser — the game view. For scenarios where the character needs to see the learner's physical environment via webcam, use a desktop or mobile build with WebcamVisionFrameSource instead.
Build validation checklist
Before shipping a WebGL build, verify each item:
Troubleshooting
Microphone never activates; character does not hear input
Build is served over HTTP, not HTTPS
Serve the build over HTTPS. localhost is exempt.
Microphone blocked in iframe; permission prompt never appears
Missing allow="microphone" on the <iframe> element
Add allow="microphone" to the iframe tag on the embedding page.
Character audio is silent; no playback
No user gesture received before audio playback attempted
Add an explicit Start button wired to ConvaiManager.EnableAudioAndStartListening().
Microphone dropdown is empty in Settings Panel
Expected — browser controls device selection on WebGL
No fix needed. The browser permission prompt handles device selection.
Microphone test fails or is unavailable
Not supported on WebGL
Expected behavior — inform users that mic testing is unavailable on browser builds.
No spatial audio; voices lack 3D positioning
Spatial audio not supported on WebGL
Expected. Consider communicating this in UI (e.g., headphone prompt).
Lip-sync timing drift
Lip-sync timing drift is a known defect on WebGL. No workaround currently exists. Validate your WebGL build visually in a browser before shipping and account for this limitation in your production timeline.
Symptom: Visible desynchronization between speech audio and mouth animation, particularly on longer utterances.
Cause: On WebGL, the SDK uses RealtimePlaybackClock (based on Time.realtimeSinceStartupAsDouble) instead of the hardware DSP clock used on native platforms. The DSP clock is tied to the audio hardware and provides sample-accurate timing. Time.realtimeSinceStartupAsDouble runs independently of the audio pipeline, which causes drift to accumulate over time.
Fix: No workaround currently exists.
Verify: Evaluate lip-sync timing visually in a browser across a full conversation turn before shipping.
Next steps
Your WebGL build is ready once HTTPS is confirmed, the gesture requirement is handled, and the validation checklist passes. If you are also deploying to iOS, Android, or XR headsets, those platforms have their own permission requirements.
iOS and AndroidXR headsetsLast updated
Was this helpful?