Vision
Enable Convai characters to perceive and respond to your Unity scene in real time. Overview of the Vision feature, its components, and platform support.
Real-Time Scene Vision for Convai Characters
Vision gives Convai characters the ability to see what is happening inside your Unity scene. When Vision is active, a continuous video stream is captured from a configurable source — a scene camera, a physical webcam, or the passthrough feed of a Meta Quest headset — and published to the Convai, where it is processed alongside the character's conversation context. Characters can then respond to what they observe, describe objects, flag hazards, or guide users based on live visual input.
Vision is a module-level feature that depends on ConvaiRoomManager operating in Video connection mode. On native platforms the video stream is sourced from a RenderTexture; on WebGL it is sourced from the visible browser canvas via canvas.captureStream().
How Vision Fits into the SDK
On WebGL the ConvaiVisionPublisher bypasses the frame source entirely and publishes the browser canvas directly. The rest of the pipeline is identical.
Key Concepts
Frame Source
A MonoBehaviour that captures frames and exposes them as a Y-flipped RenderTexture. Three built-in implementations cover cameras, webcams, and Meta Quest passthrough.
Publish Policy
Controls the client-side frame rate and bitrate used when streaming to the backend. Does not control which AI model or provider is used on the backend.
Video Track
A WebRTC video track published to the active Convai room. Identified by the Video Track Name field (default "unity-scene").
Room Connection
Vision only publishes when ConvaiRoomManager is connected with Connection Type set to Video. Audio-only connections do not carry video.
What Goes Where
Understanding which component belongs where prevents the most common setup mistakes.
ConvaiRoomManager
Any persistent scene GameObject
Connection Type must be set to Video
ConvaiVisionPublisher
Any persistent scene GameObject
Typically placed on or near the NPC's root
CameraVisionFrameSource
Same or child GameObject as the publisher
One per capture source
WebcamVisionFrameSource
Same or child GameObject as the publisher
One per capture source
QuestVisionFrameSource
Same or child GameObject as the publisher
Meta Quest only; requires Meta XR SDK
VisionDebugPreview
Any scene GameObject
Editor-only; auto-disabled in player builds
Prerequisites
Vision requires ConvaiRoomManager.Connection Type to be set to Video. If it is set to Audio, the publisher will remain idle even if all other components are correctly configured.
Platform Behaviour
PC / Mac / Console
CameraVisionFrameSource, WebcamVisionFrameSource
Full RenderTexture pipeline
Android / iOS
CameraVisionFrameSource, WebcamVisionFrameSource
Webcam source requests camera permission at startup
WebGL
(Canvas, automatic)
canvas.captureStream() path — no frame source component needed; frame rate capped at 15 fps
Meta Quest
QuestVisionFrameSource
Requires Meta XR SDK; bound to PassthroughCameraAccess via reflection
In This Section
Quick Start
Get a character receiving a live camera feed with a step-by-step Inspector walkthrough — no code required.
Frame Sources
Configure CameraVisionFrameSource, WebcamVisionFrameSource, and QuestVisionFrameSource for every platform and use case.
Publishing & Policies
Choose a publish policy, tune frame rate and bitrate, and control the video track lifecycle.
Debug Preview
Visualise the active frame source as an on-screen overlay and monitor capture health in the Editor.
Usage Examples
End-to-end examples for safety training, equipment onboarding, VR walkthroughs, and manual-trigger sessions.
Advanced Topics
Scripting API, custom IVisionFrameSource, domain events, WebGL deep dive, and platform compatibility matrix.
Troubleshooting & Diagnostics
Diagnose publishing failures, blank feeds, permission errors, and platform-specific issues with a structured checklist and decision tree.
Conclusion
Vision connects your Unity scene directly to the character's perception, enabling responses grounded in what the character can see. Start with the Quick Start to get a working stream from a scene camera, then use Frame Sources to select the right capture method for your platform and use case.
Last updated
Was this helpful?