Vision
Find guides for adding real-time scene vision to Convai characters in Unity, including frame sources, publish policies, scripting, and troubleshooting.
Vision gives Convai characters the ability to see what is happening inside your Unity scene. A frame source captures images from a Unity camera, physical webcam, or Meta Quest passthrough feed, and the SDK streams them to Convai over WebRTC alongside the audio conversation.
Platform support
PC / Mac / Console
CameraVisionFrameSource, WebcamVisionFrameSource
Full RenderTexture pipeline; max 30 fps
Android / iOS
CameraVisionFrameSource, WebcamVisionFrameSource
Webcam source requests camera permission at startup
WebGL
(Canvas, automatic)
canvas.captureStream() path — no frame source component needed; frame rate capped at 15 fps; HTTPS required
Meta Quest 3 / 3S
QuestVisionFrameSource
Requires Meta XR SDK and horizonos.permission.HEADSET_CAMERA
Vision requires ConvaiRoomManager.Connection Type set to Video. If it is set to Audio, ConvaiVisionPublisher remains idle regardless of how other components are configured.
WebGL: HTTPS required. The canvas.captureStream() API is blocked by browsers on non-HTTPS origins. http://localhost is the only exception. Deploy your WebGL build to an HTTPS host before testing Vision in production.
How vision works Understand the pipeline architecture, key concepts, component placement, and the startup sequence.
Vision quick start Get a character receiving a live camera feed with a step-by-step Inspector walkthrough — no code required.
Vision frame sources Configure CameraVisionFrameSource, WebcamVisionFrameSource, and QuestVisionFrameSource for every platform and use case.
Publish policies Choose a publish policy, tune frame rate and bitrate, and understand platform-specific behavior including WebGL.
Vision debug preview Visualize the active frame source as an on-screen overlay and monitor capture health in the Editor.
Custom frame sources Implement IVisionFrameSource to publish any custom video pipeline — interface contract, Y-flip requirement, and minimal implementation.
Vision scripting API ConvaiVisionPublisher properties and methods, runtime state monitoring, and domain events for analytics integration.
Next steps
Start with Vision quick start to get a working stream from a scene camera, then use Vision frame sources to select the right capture method for your platform. For pipeline details, see How vision works.
Last updated
Was this helpful?