Streaming Audio from Meeting Agent in Real-Time
To stream audio from the meeting agent in real-time, follow these steps:
When calling the /join API, include the WebSocket URL in the mediaStreaming object of the payload. For detailed payload structure and field descriptions, see API Payload Details.
Tip: Use a WebSocket URL with a ws:// or wss:// prefix based on your server’s configuration. We recommend wss:// for secure, encrypted connections.
- Supported Formats: Raw PCM
- Sample Rates:
16000 (default), 24000, 48000
- The default format,
pcm_16000, ensures broad compatibility with transcription services and LLMs.
2: Understanding the Streaming Protocol
Once the streaming starts, you will receive three types of events:
Streaming Initialization
This event contains metadata about the stream. Example message:
{
"event": "agent.streaming_initiation_metadata",
"data": {
"agentId": "0051c444-dd69-42da-87e7-ac89fd4d0c93",
"format": "S16LE",
"sampleRate": "pcm_16000",
"channels": 1
}
}
Audio Data
This event carries the actual audio data. Example message:
{
"event": "agent.audio_data",
"data": {
"agentId": "0051c444-dd69-42da-87e7-ac89fd4d0c93",
"audioChunk": "<Buffer>",
}
}
audioChunk: Contains the audio data as a buffer.
- During periods of silence, the audioChunk will contain silent audio data.
Speaker Timeline Updates
This event contains the speaker timeline updates. Example message:
{
"event": "agent.speaker_timeline_update",
"data": {
"agentId": "0051c444-dd69-42da-87e7-ac89fd4d0c93",
"speakerTimeline": [
{ "speaker": "adam", "start_timestamp": 12.345, "end_timestamp": 44.421 },
{ "speaker": "jason", "start_timestamp": 44.421, "end_timestamp": 46.421 },
....
]
}
}
speakerTimeline: Provides speaker attribution, detailing a complete timeline in form of an array who is speaking and when.
3. Example Implementation
- Here’s a complete Node.js WebSocket server example to handle these events:
const WebSocket = require('ws');
const fs = require('fs');
const WebSocketServer = WebSocket.Server;
const wss = new WebSocketServer({ port: 8080 });
console.log('WebSocket server is running on ws://localhost:8080');
const file = fs.createWriteStream(__dirname + '/output.raw');
wss.on('connection', (socket) => {
console.log('Client connected');
socket.on('message', (message) => {
const json = JSON.parse(message);
if (json.event === 'agent.audio_data') {
file.write(Buffer.from(json.data.audioChunk));
} else if (json.event === 'agent.speaker_timeline_update') {
console.log(json.data.speakerTimeline);
}
});
socket.on('close', () => {
console.log('Client disconnected');
});
socket.on('error', (error) => {
console.error('WebSocket error:', error);
});
});
4. Playback Verification
To verify the received audio quality, use FFmpeg’s ffplay:
ffplay -f s16le -ar 16000 -ac 1 output.raw
(Replace 16000 with your actual sample rate if different)
5. Additional Notes
- The
speakerTimeline can be used for speaker attribution.
- Ensure the WebSocket connection is properly established to receive the audio stream.
By following these steps, you can successfully stream audio from the meeting agent in real-time.