Welcome to the new Mux Docs.
The old version is still available here

Add Auto-Generated Live Captions

In this guide you will learn how to add auto-generated live captions to your Mux live stream.

In this guide:

Overview

We use A.I. based speech-to-text technology to automatically generate the closed captions.

1
Is my content suitable for auto-generated live closed captions?

Is my content suitable for auto-generated live closed captions?

Non technical content with clear audio and minimal background noise is most suitable for auto-generated live captions.

2
Increase accuracy of captions with transcription vocabulary

Increase accuracy of captions with transcription vocabulary

Providing vocabulary for technical terms and proper nouns can increase accuracy of auto-generated live captions

3
Create a new transcription vocabulary

Create a new transcription vocabulary

4
Enable auto-generated live closed captions

Enable auto-generated live closed captions

Get started with adding auto-generated live closed captions to your Mux live stream.

5
Update stream to not auto-generate closed captions for future connections

Update stream to not auto-generate closed captions for future connections

Let Mux know to not auto-generate closed captions when the live stream starts again.

6
Manage and update your transcription vocabulary

Manage and update your transcription vocabulary

FAQs

Overview

Mux is excited to offer auto-generated live closed captions in English. Closed captions make video more accessible to people who are deaf or hard of hearing, but the benefits go beyond accessibility. Captions empower your viewers to consume video content in whichever way is best for them, whether it be audio, text, or a combination.

For auto-generated live closed captions, we use artificial intelligence based speech-to-text technology to generate the closed captions. Closed captions refer to the visual display of the audio in a program.

1Is my content suitable for auto-generated live closed captions?

Non technical content with clear audio and minimal background noise is most suitable for auto-generated live captions. Content with music and multiple speakers speaking over each other are not good use cases for auto-generated live captions.

Accuracy ranges for auto-generated live captions range from 70-95%.

2Increase accuracy of captions with transcription vocabulary

For all content, we recommend you provide transcription vocabulary of technical terms (e.g. CODEC) and proper nouns. By providing the transcription vocabulary beforehand, you can increase the accuracy of the closed captions.

The transcription vocabulary helps the speech to text engine transcribe terms that otherwise may not be part of general library. Your use case may involve brand names or proper names that are not normally part of a language model’s library (e.g. "Mux"). Or perhaps you have a term, say "Orchid" which is a brand name of a toy. The engine will recognize "orchid" as a flower but you would want the word transcribed with proper capitalization in the context as a brand.

Please note that it can take up to 20 seconds for the transcription vocabulary to be applied to your live stream.

3Create a new transcription vocabulary

You can create a new transcription library by making a POST request to /transcription-vocabularies endpoint API and define the input parameters. Each transcription library can have up to 1,000 phrases.

Request Body Parameters

Input parameters	Type	Description
name	`string`	The human readable description of the transcription library.
phrases	`array`	An array of phrases to populate the transcription library. A phrase can be one word or multiple words, usually describing a single object or concept.

API Request

POST /v1/transcription-vocabularies 
{
  "name": "TMI vocabulary",
  "phrases": ["Mux", "Demuxed", "The Mux Informational", "video.js", "codec", "rickroll"]
}

API Response

{
  "data": {
    "updated_at": "1656630612",
    "phrases": [
      "Mux",
      "Demuxed",
      "The Mux Informational",
      "video.js",
      "codec",
      "rickroll"
    ],
    "name": "TMI vocabulary",
    "id": "4uCfJqluoYxl8KjXxNF00TgB56OyM152B5ZR00cLKXFlc",
    "created_at": "1656630612"
  }
}

4Enable auto-generated live closed captions

Add the generated_subtitles array at time of stream creation or to an existing live stream.

Request Body Parameters

Input parameters	Type	Description
`name`	`string`	The human readable description for the generated subtitle track. This value must be unique across all the text type and subtitles text type tracks. If not provided, the name is generated from the chosen `language_code`.
`passthrough`	`string`	Arbitrary metadata set for the generated subtitle track.
`language_code`	`string`	BCP 47 language code for captions. Defaults to `"en"`. For auto-generated captions, only English is supported at this time (`"en"`, `"en-US"`, etc.).
`transcription_vocabulary_ids`	`array`	The IDs of existing Transcription Vocabularies that you want to be applied to the live stream. If the vocabularies together contain more than 1,000 unique phrases, only the first 1,000 will be used.

Step 1A: Create a live stream in Mux

Create a live stream using the Live Stream Creation API. Let Mux know that you want auto-generated live closed captions.

API Request

POST /video/v1/live-streams

Request Body
{ 
  "playback_policy" : ["public"],
  "generated_subtitles": [
    {
      "name": "English CC (auto)",
      "passthrough": "English closed captions (auto-generated)",
      "language_code": "en-US",
      "transcription_vocabulary_ids": ["4uCfJqluoYxl8KjXxNF00TgB56OyM152B5ZR"]    
    }
  ],
  "new_asset_settings" : {
    "playback_policy" : ["public"]
  }
}

API Response

Response
{
  "data": {
    "stream_key": "5bd28537-7491-7ffa-050b-bbb506401234",
    "playback_ids": [
      {
        "policy": "public",
        "id": "U00gVu02hfLPdaGnlG1dFZ00ZkBUm2m0"
      }
    ],
    "new_asset_settings": {
      "playback_policies": [
        "public"
      ]
    },
    "generated_subtitles" : [
      "name": "English CC (auto)",
      "passthrough": "English closed captions (auto-generated)",
      "language_code": "en-US",
      "transcription_vocabulary_ids": ["4uCfJqluoYxl8KjXxNF00TgB56OyM152B5ZR"]  
    ],
    "id": "e00Ed01C9ws015d5SLU00ZsaUZzh5nYt02u",
    "created_at": "1624489336"
  }
}

Step 1B: Configure live captions for an existing live stream

Use the Generated Subtitles API to configure generated closed captions to an existing live stream. Live closed captions can not be configured to an active live stream.

API Request

PUT /video/v1/live-streams/{live_stream_id}/generated-subtitles

Request Body
{
  "generated_subtitles": [
    {
      "name": "English CC (auto)",
      "passthrough": "{\"description\": \"English closed captions (auto-generated)\"}",
      "language_code": "en-US",
      "transcription_vocabulary_ids": ["4uCfJqluoYxl8KjXxNF00TgB56OyM152B5ZR"]  
    }
  ]
}

API Response

Response
{
  "data": {
    "stream_key": "5bd28537-7491-7ffa-050b-bbb506401234",
    "playback_ids": [
      {
        "policy": "public",
        "id": "U00gVu02hfLPdaGnlG1dFZ00ZkBUm2m0"
      }
    ],
    "new_asset_settings": {
      "playback_policies": [
        "public"
      ]
    },
    "generated_subtitles": [
      {
        "name": "English CC (auto)",
        "passthrough": "{\"description\": \"English closed captions (auto-generated)\"}",
        "language_code": "en-US",
        "transcription_vocabulary_ids": ["4uCfJqluoYxl8KjXxNF00TgB56OyM152B5ZR"]  
      }
    ]
  }
}

Step 2: Start your live stream

At the start of the Live Stream, two text tracks will be created for the active asset, with text_source attributes of generated_live and generated_live_final, respectively.
While the stream is live, the generated_live track will be available and include predicted text for the audio.
At the end of the stream, the generated_live_final track will transition from the preparing to ready state; this track will include finalized predictions of text and result in higher-accuracy, better-timed text.
After the live event has concluded, the playback experience of the asset created will only include the more accurate generated_live_final track, but the sidecar VTT files for both tracks will continue to exist.

5Update stream to not auto-generate closed captions for future connections

To prevent future connections to your live stream from receiving auto-generated closed captions, update the generated_subtitles configuration to null or an empty array.

API Request

PUT /video/v1/live-streams/{live_stream_id}/generated-subtitles

Request Body 
{
  "generated_subtitles" : []
}

6Manage and update your transcription vocabulary

Update phrases in a transcription vocabulary

Phrases can be updated at any time, but won't go into effect to active live streams with auto-generated live closed captions enabled where the transcription vocabulary has been applied. If the updates are applied to an active live stream, they will not be applied until the next time the stream is active.

API Request

PUT /v1/transcription-vocabularies/$ID
{
  "phrases": ["Demuxed", "HLS.js"]
}

FAQs

What happens if my live stream has participants speaking languages other than English?

If you send a stream containing non-English, we will attempt to auto-generate captions for all the content in English. e.g. If French and English are spoken, we will create captions for the French language content using the English model and the output would be incomprehensible.

When can I edit my live caption configuration?

Only when the live stream is idle. You cannot make any changes while the live stream is active.

How do I download my auto-generated closed caption track?

https://stream.mux.com/{PLAYBACK_ID}/text/{TRACK_ID}.vtt

More details can be found at Advanced Playback features

Do live captions work with low latency live streams?

Not at this time.