Skip to main content

Streaming Audio from Meeting Agent in Real-Time

To stream audio from the meeting agent in real-time, follow these steps:

1. Use the /join API with mediaStreaming Payload

When calling the /join API, include the WebSocket URL in the mediaStreaming object of the payload. For detailed payload structure and field descriptions, see API Payload Details.
Tip: Use a WebSocket URL with a ws:// or wss:// prefix based on your server’s configuration. We recommend wss:// for secure, encrypted connections.
  • Supported Formats: Raw PCM
  • Sample Rates: 16000 (default), 24000, 48000
    • The default format, pcm_16000, ensures broad compatibility with transcription services and LLMs.

2: Understanding the Streaming Protocol

Once the streaming starts, you will receive three types of events:

Streaming Initialization

This event contains metadata about the stream. Example message:
{
    "event": "agent.streaming_initiation_metadata",
    "data": {
        "agentId": "0051c444-dd69-42da-87e7-ac89fd4d0c93",
        "format": "S16LE",
        "sampleRate": "pcm_16000",
        "channels": 1
    }
}

Audio Data

This event carries the actual audio data. Example message:
{
    "event": "agent.audio_data",
    "data": {
        "agentId": "0051c444-dd69-42da-87e7-ac89fd4d0c93",
        "audioChunk": "<Buffer>",
    }
}
  • audioChunk: Contains the audio data as a buffer.
  • During periods of silence, the audioChunk will contain silent audio data.

Speaker Timeline Updates

This event contains the speaker timeline updates. Example message:
{
    "event": "agent.speaker_timeline_update",
    "data": {
        "agentId": "0051c444-dd69-42da-87e7-ac89fd4d0c93",
        "speakerTimeline": [
            { "speaker": "adam", "start_timestamp": 12.345, "end_timestamp": 44.421 },
            { "speaker": "jason", "start_timestamp": 44.421, "end_timestamp": 46.421 },
            ....
        ]
    }
}
  • speakerTimeline: Provides speaker attribution, detailing a complete timeline in form of an array who is speaking and when.

3. Example Implementation

  • Here’s a complete Node.js WebSocket server example to handle these events:
const WebSocket = require('ws');
const fs = require('fs');

const WebSocketServer = WebSocket.Server;
const wss = new WebSocketServer({ port: 8080 });

console.log('WebSocket server is running on ws://localhost:8080');

const file = fs.createWriteStream(__dirname + '/output.raw');
wss.on('connection', (socket) => {
  console.log('Client connected');

  socket.on('message', (message) => {
    const json = JSON.parse(message);

    if (json.event === 'agent.audio_data') {
      file.write(Buffer.from(json.data.audioChunk));
    } else if (json.event === 'agent.speaker_timeline_update') {
      console.log(json.data.speakerTimeline);
    }
  });

  socket.on('close', () => {
    console.log('Client disconnected');
  });

  socket.on('error', (error) => {
    console.error('WebSocket error:', error);
  });
});

4. Playback Verification

To verify the received audio quality, use FFmpeg’s ffplay:
ffplay -f s16le -ar 16000 -ac 1 output.raw
(Replace 16000 with your actual sample rate if different)

5. Additional Notes

  • The speakerTimeline can be used for speaker attribution.
  • Ensure the WebSocket connection is properly established to receive the audio stream.
By following these steps, you can successfully stream audio from the meeting agent in real-time.