Vision

Find guides for adding real-time scene vision to Convai characters in Unity, including frame sources, publish policies, scripting, and troubleshooting.

Vision gives Convai characters the ability to see what is happening inside your Unity scene. A frame source captures images from a Unity camera, physical webcam, or Meta Quest passthrough feed, and the SDK streams them to Convai over WebRTC alongside the audio conversation.

Platform support

Platform
Supported frame sources
Notes

PC / Mac / Console

CameraVisionFrameSource, WebcamVisionFrameSource

Full RenderTexture pipeline; max 30 fps

Android / iOS

CameraVisionFrameSource, WebcamVisionFrameSource

Webcam source requests camera permission at startup

WebGL

(Canvas, automatic)

canvas.captureStream() path — no frame source component needed; frame rate capped at 15 fps; HTTPS required

Meta Quest 3 / 3S

QuestVisionFrameSource

Requires Meta XR SDK and horizonos.permission.HEADSET_CAMERA

Vision requires ConvaiRoomManager.Connection Type set to Video. If it is set to Audio, ConvaiVisionPublisher remains idle regardless of how other components are configured.

Next steps

Start with Vision quick start to get a working stream from a scene camera, then use Vision frame sources to select the right capture method for your platform. For pipeline details, see How vision works.

Last updated

Was this helpful?