Decoder API reference

class Alphabet(*args: Any, **kwargs: Any)[source]

An Alphabet is a bidirectional map from tokens (eg. characters) to internal integer representations used by the underlying acoustic models and external scorers. It can be created from alphabet configuration file via the constructor, or from a list of tokens via Alphabet.InitFromLabels().

InitFromLabels(data)[source]

Initialize Alphabet from a list of labels data. Each label gets associated with an integer value corresponding to its position in the list.

CanEncodeSingle(input)[source]

Returns true if the single character/output class has a corresponding label in the alphabet.

CanEncode(input)[source]

Returns true if the entire string can be encoded into labels in this alphabet.

EncodeSingle(input)[source]

Encode a single character/output class into a label. Character must be in the alphabet, this method will assert that. Use CanEncodeSingle to test.

Encode(input)[source]

Encode a sequence of character/output classes into a sequence of labels. Characters are assumed to always take a single Unicode codepoint. Characters must be in the alphabet, this method will assert that. Use CanEncode and CanEncodeSingle to test.

Decode(input)[source]

Decode a sequence of labels into a string.

class Scorer(*args: Any, **kwargs: Any)[source]
An external scorer is a data structure composed of a language model built

from text data, as well as the vocabulary used in the construction of this language model and additional parameters related to how the decoding process uses the external scorer, such as the language model weight alpha and the word insertion score beta.

Parameters
  • alpha (float) – Language model weight.

  • beta (float) – Word insertion score.

  • scorer_path (str) – Path to load scorer from.

  • alphabet (Alphabet) – Alphabet object matching the tokens used when creating the external scorer.

ctc_beam_search_decoder(probs_seq, alphabet, beam_size, cutoff_prob=1.0, cutoff_top_n=40, scorer=None, hot_words={}, num_results=1)[source]

Wrapper for the CTC Beam Search Decoder.

Parameters
  • probs_seq (2-D list) – 2-D list of probability distributions over each time step, with each element being a list of normalized probabilities over alphabet and blank.

  • alphabet – Alphabet

  • beam_size (int) – Width for beam search.

  • cutoff_prob (float) – Cutoff probability in pruning, default 1.0, no pruning.

  • cutoff_top_n (int) – Cutoff number in pruning, only top cutoff_top_n characters with highest probs in alphabet will be used in beam search, default 40.

  • scorer (Scorer) – External scorer for partially decoded sentence, e.g. word count or language model.

  • hot_words (dict[string, float]) – Map of words (keys) to their assigned boosts (values)

  • num_results (int) – Number of beams to return.

Returns

List of tuples of confidence and sentence as decoding results, in descending order of the confidence.

Return type

list

ctc_beam_search_decoder_batch(probs_seq, seq_lengths, alphabet, beam_size, num_processes, cutoff_prob=1.0, cutoff_top_n=40, scorer=None, hot_words={}, num_results=1)[source]

Wrapper for the batched CTC beam search decoder.

Parameters
  • probs_seq (3-D list) – 3-D list with each element as an instance of 2-D list of probabilities used by ctc_beam_search_decoder().

  • alphabet – alphabet list.

  • beam_size (int) – Width for beam search.

  • num_processes (int) – Number of parallel processes.

  • cutoff_prob (float) – Cutoff probability in alphabet pruning, default 1.0, no pruning.

  • cutoff_top_n (int) – Cutoff number in pruning, only top cutoff_top_n characters with highest probs in alphabet will be used in beam search, default 40.

  • num_processes – Number of parallel processes.

  • scorer (Scorer) – External scorer for partially decoded sentence, e.g. word count or language model.

  • hot_words (dict[string, float]) – Map of words (keys) to their assigned boosts (values)

  • num_results (int) – Number of beams to return.

Alphabet

Alphabet

Returns

List of tuples of confidence and sentence as decoding results, in descending order of the confidence.

Return type

list

class FlashlightDecoderState(*args: Any, **kwargs: Any)[source]

This class contains constants used to specify the desired behavior for the flashlight_beam_search_decoder() and flashlight_beam_search_decoder_batch() functions.

class CriterionType(value)[source]

Constants used to specify which loss criterion was used by the acoustic model. This class is a Python enum.IntEnum.

CTC

Decoder mode for handling acoustic models trained with CTC loss

ASG

Decoder mode for handling acoustic models trained with ASG loss

S2S

Decoder mode for handling acoustic models trained with Seq2seq loss Note: this criterion type is currently not supported.

class DecoderType(value)[source]

Constants used to specify if decoder should operate in lexicon mode, only predicting words present in a fixed vocabulary, or in lexicon-free mode, without such restriction. This class is a Python enum.IntEnum.

LexiconBased

Lexicon mode, only predict words in specified vocabulary.

LexiconFree

Lexicon-free mode, allow prediction of any word.

class TokenType(value)[source]

Constants used to specify the granularity of text units used when training the external scorer in relation to the text units used when training the acoustic model. For example, you can have an acoustic model predicting characters and an external scorer trained on words, or an acoustic model and an external scorer both trained with sub-word units. If the acoustic model and the scorer were both trained on the same text unit granularity, use TokenType.Single. Otherwise, if the external scorer was trained on a sequence of acoustic model text units, use TokenType.Aggregate. This class is a Python enum.IntEnum.

Single

Token type for external scorers trained on the same textual units as the acoustic model.

Aggregate

Token type for external scorers trained on a sequence of acoustic model textual units.

flashlight_beam_search_decoder(logits_seq, alphabet, beam_size, decoder_type, token_type, lm_tokens, scorer=None, beam_threshold=25.0, cutoff_top_n=40, silence_score=0.0, merge_with_log_add=False, criterion_type=native_client.ctcdecode.swigwrapper.FlashlightDecoderState.CTC, transitions=[], num_results=1)[source]
Decode acoustic model emissions for a single sample. Note that unlike

ctc_beam_search_decoder(), this function expects raw outputs from CTC and ASG acoustic models, without softmaxing them over timesteps.

Parameters
  • logits_seq (2-D list of floats or numpy array) – 2-D list of acoustic model emissions, dimensions are time steps x number of output units.

  • alphabet (Alphabet) – Alphabet object matching the tokens used when creating the acoustic model and external scorer if specified.

  • beam_size (int) – Width for beam search.

  • decoder_type (FlashlightDecoderState.DecoderType) – Decoding mode, lexicon-constrained or lexicon-free.

  • token_type (FlashlightDecoderState.TokenType) – Type of token in the external scorer.

  • lm_tokens – List of tokens to constrain decoding to when in lexicon-constrained mode. Must match the token type used in the scorer, ie. must be a list of characters if scorer is character-based, or a list of words if scorer is word-based.

  • lm_tokens – list[str]

  • scorer (Scorer) – External scorer.

  • beam_threshold (float) – Maximum threshold in beam score from leading beam. Any newly created candidate beams which lag behind the best beam so far by more than this value will get pruned. This is a performance optimization parameter and an appropriate value should be found empirically using a validation set.

  • cutoff_top_n (int) – Maximum number of tokens to expand per time step during decoding. Only the highest probability cutoff_top_n candidates (characters, sub-word units, words) in a given timestep will be expanded. This is a performance optimization parameter and an appropriate value should be found empirically using a validation set.

  • silence_score (float) – Score to add to beam when encountering a predicted silence token (eg. the space symbol).

  • merge_with_log_add (bool) – Whether to use log-add when merging scores of new candidate beams equivalent to existing ones (leading to the same transcription). When disabled, the maximum score is used.

  • criterion_type (FlashlightDecoderState.CriterionType) – Criterion used for training the acoustic model.

  • transitions (list[float]) – Transition score matrix for ASG acoustic models.

  • num_results (int) – Number of beams to return.

Returns

List of FlashlightOutput structures.

Return type

list[FlashlightOutput]

flashlight_beam_search_decoder_batch(probs_seq, seq_lengths, alphabet, beam_size, decoder_type, token_type, lm_tokens, num_processes, scorer=None, beam_threshold=25.0, cutoff_top_n=40, silence_score=0.0, merge_with_log_add=False, criterion_type=native_client.ctcdecode.swigwrapper.FlashlightDecoderState.CTC, transitions=[], num_results=1)[source]

Decode batch acoustic model emissions in parallel. num_processes controls how many samples from the batch will be decoded simultaneously. All the other parameters are forwarded to flashlight_beam_search_decoder().

Returns a list of lists of FlashlightOutput structures.

class UTF8Alphabet(*args: Any, **kwargs: Any)[source]

Alphabet class representing 255 possible byte values for Bytes Output Mode. For internal use only.

CanEncodeSingle(input)[source]

Returns true if the single character/output class has a corresponding label in the alphabet.

CanEncode(input)[source]

Returns true if the entire string can be encoded into labels in this alphabet.

EncodeSingle(input)[source]

Encode a single character/output class into a label. Character must be in the alphabet, this method will assert that. Use CanEncodeSingle to test.

Encode(input)[source]

Encode a sequence of character/output classes into a sequence of labels. Characters are assumed to always take a single Unicode codepoint. Characters must be in the alphabet, this method will assert that. Use CanEncode and CanEncodeSingle to test.

Decode(input)[source]

Decode a sequence of labels into a string.