Custom Language Models

Custom Language Models

The TelVue Connect Closed Captioning Service uses speech-to-text Artificial Intelligence technology to automatically create captioning from the audio in uploaded videos for both Broadcast and Streaming/OTT workflows. The core speech-to-text engine uses a general model of common, everyday words and phrases. For some applications, it can be helpful to additionally train the base model with Custom Language Models (custom words, phrases, and names) for improved accuracy.

Corpora

A Corpus (singular of Corpora) is a collection of words, names, and phrases used train the captioning engine with uncommon, unique, and domain specific terms. For example, you could define a Corpus for generic government meetings with words and phrases like “recuse”, “call to order”, “make a motion”, and “is there a second.” You could define another Corpus for a specific meeting type, such as Board of Education with the names of the board members.

To create a Corpus:

  1. Hover over the Media Tab and select Captions Generation.
  2. Select the Corpora Tab.
  3. Click the New Custom Corpus button.
  4. Name the specific corpus.
  5. Add any words, phrases, and names to be associated to this Corpus, per line, with a return/carriage return in between.
  6. Click the Save button to save your Corpus.

To edit or delete a Corpus:

  1. Hover over the Media Tab and select Captions Generation.
  2. Select the Corpora Tab.
  3. Hover over the Gear actions menu for the Corpus you would like to edit or delete.
  4. Select the desired action.

Models

A Model is a grouping of one or more Corpus. A given Corpus can be included in multiple Models. For example, you might have a Model for Planning Board, Board of Education, and City Council meetings, that share a common Generic Meeting Corpus, but each has its own specific Corpus.

To create a Custom Language Model:

  1. Hover over the Media Tab and select Captions Generation.
  2. Select the Models Tab.
  3. Click the New Language Model button.
  4. Name specific Model.
  5. Click the Save button.
  6. Hover over the Gear actions menu for the model you just created, and select Edit.
  7. Select one or more Custom Corpora to be included within this Model for training.
  8. Set the Customization Weight via the slider (on a scale from .1 to 1). The Customization Weight determines how weight to give words from the Custom Language Model compared to those from the base model. The higher the value, the higher the weight. The default of .3 is recommended, though you can test other settings to see what works best. If the weight is too high, it is possible that spoken language that is not actually a match for your custom model will be mistaken for a match. If the weight is too low, spoken language that matches your custom model may not be detected as such.
  9. Click the Save button to save your Language Model.

To edit or delete a Custom Language Model:

  1. Hover over the Media Tab and select Captions Generation.
  2. Select the Models Tab (Corpora is plural of Corpus).
  3. Hover over the Gear actions menu for the Model you would like to edit or delete.
  4. Select the desired action.

Upon adding or editing a Custom Language Model, a Language Model Update background activity will be queued to train the model. The progress of the training background activity can be tracked on the Activities Tab.

Once the Language Model Update is complete, subsequent Captions creation will use the updated models.

Setting the Default Custom Model

If you have the “Auto generate captions for all uploaded files” option enabled, the Default Custom Model setting will determine which if any Custom Language Model is used. Likewise, when manually triggering captioning, the Default Custom Model setting determines which Custom Language Model if any is the default selection, that you can then choose to override.

To set the Default Custom Model:

  1. Hover over the Administration Tab and select Organization Settings.
  2. Select the Captions tab.
  3. Select your Default Custom Language Model, or select [No Custom Model] to not use a Custom Model by default.
  4. Click the Save button to save your changes.

Selecting a Custom Language Model for Captioning

When manually trigger captioning for media using the Bulk Generate Closed Captions or individual “generate closed caption file” actions, you will be presented with the option to select one of your Custom Language Models. If you do not want to use a Custom Language Model, select “No Custom Model”.

in CloudCast User ManualConnect User Manual

Related Articles