Streaming Transcription API

Stream audio to Convai’s ASR engine using WebSockets and receive real-time transcriptions. Designed for low-latency, intelligent voice experiences with your AI characters.

triangle-exclamation
triangle-exclamation

All the relevant API details needed to stream real-time audio input and receive text transcription from Convai’s ASR (Automatic Speech Recognition) engine.

Access this endpoint to enable live speech recognition within Convai-based applications, voice-enabled AI characters. Once connected, the WebSocket channel streams audio in and transcriptions out, enabling responsive, conversational experiences.


Overview

Base URL:

https://transcribe.convai.com

WebSocket Endpoint:

wss://transcribe.convai.com/stream

Protocol: Bidirectional — stream 16-bit PCM audio in, receive transcript events out.

Supported Languages: English


Authentication

Provide your Convai API key during the initial WebSocket handshake.

Headers

Name
Type
Description

CONVAI-API-KEY

String

Your unique API key, found in your Convai account.

Alternative (Query Parameter)

If header authentication is not possible:

wss://transcribe.convai.com/stream?convai-api-key=<your-api-key>

If your API key is missing or invalid, the connection will close immediately with an error event.


Connect Session

wss://transcribe.convai.com/stream

Description

Establishes a live WebSocket connection with Convai’s transcription service. Once the session is active, you can send binary audio frames and receive incremental (transcript.partial) and finalized (transcript.final) transcripts.


Session Start Example

Session Close Example


Response (Server Event) Body

Although WebSockets are used (not traditional JSON POSTs), message payloads follow this structure:

Name
Type
Description

type

String

Message type, such as finalize, stop, or close.

data

Object

Optional data fields depending on the message type.


WebSocket Event Reference

Event Type
Description
Example Payload

session.started

Sent by the server once a session is successfully initialized.

{"type": "session.started", "data": {"session_id": "...", "expires_at": "..."}}

session.closed

Indicates that the session has ended gracefully.

{"type": "session.closed", "data": {}}

transcript.partial

Partial transcription updates (non-final).

{"type": "transcript.partial", "data": {"sequence_id": 1, "text": "hel", "is_final": false}}

transcript.final

Finalized transcription with or without formatting.

{"type": "transcript.final", "data": {"sequence_id": 1, "text": "Hello world.", "is_final": true, "is_formatted": true}}

error

Returned when invalid data, API key, or connection issues occur.

{"type": "error", "data": {"message": "Invalid API key."}}


Common Data Fields

Field
Type
Description

sequence_id

Integer

Counter for ordering transcript messages.

text

String

Transcript string received so far.

is_final

Boolean

True when the transcription for the current utterance is finalized.

is_formatted

Boolean

True when punctuation and casing have been applied.

message_type

String

Metadata tag, e.g., Turn or FinalTranscript.

language_code

String / null

Detected language. Experimental Feature, can be null.


Streaming Audio Requirements

Parameter
Specification

Encoding

PCM 16-bit, little-endian

Channels

Mono

Sample Rate

16 kHz

Recommended Frame Size

50–150 ms

Max Frame Size

~8 MiB

circle-check

Control Messages

Send text-based JSON messages to manage the stream:

Command
Example
Description

Finalize

{"type": "finalize"}

Triggers the server to emit a final transcript.

Close

{"type": "close"}

Gracefully closes the WebSocket session.

Stop

{"type": "stop"}

Equivalent to close, may retain session context.

triangle-exclamation

Error Handling

Errors are reported as structured JSON objects.

Example Error Message


Status Codes

Code
Description

200

OK — Request/connection succeeded.

400

Bad Request — Malformed or invalid payload.

401

Unauthorized — Invalid or missing API key.

403

Forbidden — Plan not authorized for API access.

500

Internal Server Error.


Troubleshooting

Issue
Possible Cause
Resolution

401 Unauthorized

Invalid or missing API key.

Verify your CONVAI-API-KEY header or query parameter.

No transcription received

Incorrect audio format.

Ensure PCM 16-bit, mono, 16 kHz encoding.

Frequent disconnects

Idle socket or malformed data.

Keep streaming frames; implement reconnection logic.

Punctuation missing

Unformatted transcript.

Wait for second transcript.final with "is_formatted": true.


Example Progression (Single Utterance)


circle-info

Currently, only English is supported.

circle-check

Example (End-to-End Streaming Client)

Below are sample implementations and commands demonstrating how to connect to the Streaming Transcription API and perform real time transcription.

End to End Streaming Client - Python

This example creates a real time transcription client using Python, WebSockets, and sounddevice for live microphone input.

Requirements:

  • Python 3.8+

  • pip install websockets sounddevice

File: convai_stt_stream.py

Running Steps

Output Example

cURL - Quick Connectivity Check

curl is not intended for WebSocket streaming. Use it to verify that the HTTPS endpoint is reachable and your key is accepted.

Expected response:


Conclusion

The Streaming Transcription API delivers real-time, low-latency speech recognition through WebSockets, enabling fluid and natural AI interactions. By integrating this API, you can power responsive voice-based experiences within games, assistants, or immersive Convai-enabled environments.

Last updated

Was this helpful?