Decoder API reference¶
-
class
Alphabet
(*args: Any, **kwargs: Any)[source]¶ An Alphabet is a bidirectional map from tokens (eg. characters) to internal integer representations used by the underlying acoustic models and external scorers. It can be created from alphabet configuration file via the constructor, or from a list of tokens via
Alphabet.InitFromLabels()
.-
InitFromLabels
(data)[source]¶ Initialize Alphabet from a list of labels
data
. Each label gets associated with an integer value corresponding to its position in the list.
-
CanEncodeSingle
(input)[source]¶ Returns true if the single character/output class has a corresponding label in the alphabet.
-
CanEncode
(input)[source]¶ Returns true if the entire string can be encoded into labels in this alphabet.
-
EncodeSingle
(input)[source]¶ Encode a single character/output class into a label. Character must be in the alphabet, this method will assert that. Use CanEncodeSingle to test.
-
-
class
Scorer
(*args: Any, **kwargs: Any)[source]¶ - An external scorer is a data structure composed of a language model built
from text data, as well as the vocabulary used in the construction of this language model and additional parameters related to how the decoding process uses the external scorer, such as the language model weight
alpha
and the word insertion scorebeta
.
-
class
DecodeResult
(confidence, transcript, tokens, timesteps)¶ -
property
confidence
¶ Alias for field number 0
-
property
timesteps
¶ Alias for field number 3
-
property
tokens
¶ Alias for field number 2
-
property
transcript
¶ Alias for field number 1
-
property
-
ctc_beam_search_decoder
(probs_seq, alphabet, beam_size, cutoff_prob=1.0, cutoff_top_n=40, scorer=None, hot_words={}, num_results=1)[source]¶ Wrapper for the CTC Beam Search Decoder.
- Parameters
probs_seq (2-D list) – 2-D list of probability distributions over each time step, with each element being a list of normalized probabilities over alphabet and blank.
alphabet – Alphabet
beam_size (int) – Width for beam search.
cutoff_prob (float) – Cutoff probability in pruning, default 1.0, no pruning.
cutoff_top_n (int) – Cutoff number in pruning, only top cutoff_top_n characters with highest probs in alphabet will be used in beam search, default 40.
scorer (Scorer) – External scorer for partially decoded sentence, e.g. word count or language model.
hot_words (dict[string, float]) – Map of words (keys) to their assigned boosts (values)
num_results (int) – Number of beams to return.
- Returns
List of tuples of confidence and sentence as decoding results, in descending order of the confidence.
- Return type
-
ctc_beam_search_decoder_for_wav2vec2am
(probs_seq, alphabet, beam_size, cutoff_prob=1.0, cutoff_top_n=40, blank_id=- 1, ignored_symbols=frozenset({}), scorer=None, hot_words={}, num_results=1)[source]¶ Wrapper for the CTC Beam Search Decoder.
- Parameters
probs_seq (2-D list) – 2-D list of probability distributions over each time step, with each element being a list of normalized probabilities over alphabet and blank.
alphabet – Alphabet
beam_size (int) – Width for beam search.
cutoff_prob (float) – Cutoff probability in pruning, default 1.0, no pruning.
cutoff_top_n (int) – Cutoff number in pruning, only top cutoff_top_n characters with highest probs in alphabet will be used in beam search, default 40.
scorer (Scorer) – External scorer for partially decoded sentence, e.g. word count or language model.
hot_words (dict[string, float]) – Map of words (keys) to their assigned boosts (values)
num_results (int) – Number of beams to return.
- Returns
List of tuples of confidence and sentence as decoding results, in descending order of the confidence.
- Return type
-
ctc_beam_search_decoder_batch
(probs_seq, seq_lengths, alphabet, beam_size, num_processes, cutoff_prob=1.0, cutoff_top_n=40, scorer=None, hot_words={}, num_results=1)[source]¶ Wrapper for the batched CTC beam search decoder.
- Parameters
probs_seq (3-D list) – 3-D list with each element as an instance of 2-D list of probabilities used by ctc_beam_search_decoder().
alphabet – alphabet list.
beam_size (int) – Width for beam search.
num_processes (int) – Number of parallel processes.
cutoff_prob (float) – Cutoff probability in alphabet pruning, default 1.0, no pruning.
cutoff_top_n (int) – Cutoff number in pruning, only top cutoff_top_n characters with highest probs in alphabet will be used in beam search, default 40.
num_processes – Number of parallel processes.
scorer (Scorer) – External scorer for partially decoded sentence, e.g. word count or language model.
hot_words (dict[string, float]) – Map of words (keys) to their assigned boosts (values)
num_results (int) – Number of beams to return.
- Alphabet
Alphabet
- Returns
List of tuples of confidence and sentence as decoding results, in descending order of the confidence.
- Return type
-
ctc_beam_search_decoder_for_wav2vec2am_batch
(probs_seq, seq_lengths, alphabet, beam_size, num_threads, cutoff_prob=1.0, cutoff_top_n=40, blank_id=- 1, ignored_symbols=frozenset({}), scorer=None, hot_words={}, num_results=1)[source]¶ Wrapper for the batched CTC beam search decoder for wav2vec2 AM.
- Parameters
probs_seq (3-D list) – 3-D list with each element as an instance of 2-D list of probabilities used by ctc_beam_search_decoder().
alphabet – alphabet list.
beam_size (int) – Width for beam search.
num_threads (int) – Number of threads to use for processing batch.
cutoff_prob (float) – Cutoff probability in alphabet pruning, default 1.0, no pruning.
cutoff_top_n (int) – Cutoff number in pruning, only top cutoff_top_n characters with highest probs in alphabet will be used in beam search, default 40.
scorer (Scorer) – External scorer for partially decoded sentence, e.g. word count or language model.
hot_words (dict[string, float]) – Map of words (keys) to their assigned boosts (values)
num_results (int) – Number of beams to return.
- Alphabet
Alphabet
- Returns
List of tuples of confidence and sentence as decoding results, in descending order of the confidence.
- Return type
-
class
FlashlightDecoderState
(*args: Any, **kwargs: Any)[source]¶ This class contains constants used to specify the desired behavior for the
flashlight_beam_search_decoder()
andflashlight_beam_search_decoder_batch()
functions.-
class
CriterionType
(value)[source]¶ Constants used to specify which loss criterion was used by the acoustic model. This class is a Python
enum.IntEnum
.-
CTC
¶ Decoder mode for handling acoustic models trained with CTC loss
-
ASG
¶ Decoder mode for handling acoustic models trained with ASG loss
-
S2S
¶ Decoder mode for handling acoustic models trained with Seq2seq loss Note: this criterion type is currently not supported.
-
-
class
DecoderType
(value)[source]¶ Constants used to specify if decoder should operate in lexicon mode, only predicting words present in a fixed vocabulary, or in lexicon-free mode, without such restriction. This class is a Python
enum.IntEnum
.-
LexiconBased
¶ Lexicon mode, only predict words in specified vocabulary.
-
LexiconFree
¶ Lexicon-free mode, allow prediction of any word.
-
-
class
TokenType
(value)[source]¶ Constants used to specify the granularity of text units used when training the external scorer in relation to the text units used when training the acoustic model. For example, you can have an acoustic model predicting characters and an external scorer trained on words, or an acoustic model and an external scorer both trained with sub-word units. If the acoustic model and the scorer were both trained on the same text unit granularity, use
TokenType.Single
. Otherwise, if the external scorer was trained on a sequence of acoustic model text units, useTokenType.Aggregate
. This class is a Pythonenum.IntEnum
.-
Single
¶ Token type for external scorers trained on the same textual units as the acoustic model.
-
Aggregate
¶ Token type for external scorers trained on a sequence of acoustic model textual units.
-
-
class
-
flashlight_beam_search_decoder
(logits_seq, alphabet, beam_size, decoder_type, token_type, lm_tokens, scorer=None, beam_threshold=25.0, cutoff_top_n=40, silence_score=0.0, merge_with_log_add=False, criterion_type=native_client.ctcdecode.swigwrapper.FlashlightDecoderState.CTC, transitions=[], num_results=1)[source]¶ - Decode acoustic model emissions for a single sample. Note that unlike
ctc_beam_search_decoder()
, this function expects raw outputs from CTC and ASG acoustic models, without softmaxing them over timesteps.
- Parameters
logits_seq (2-D list of floats or numpy array) – 2-D list of acoustic model emissions, dimensions are time steps x number of output units.
alphabet (Alphabet) – Alphabet object matching the tokens used when creating the acoustic model and external scorer if specified.
beam_size (int) – Width for beam search.
decoder_type (FlashlightDecoderState.DecoderType) – Decoding mode, lexicon-constrained or lexicon-free.
token_type (FlashlightDecoderState.TokenType) – Type of token in the external scorer.
lm_tokens – List of tokens to constrain decoding to when in lexicon-constrained mode. Must match the token type used in the scorer, ie. must be a list of characters if scorer is character-based, or a list of words if scorer is word-based.
lm_tokens – list[str]
scorer (Scorer) – External scorer.
beam_threshold (float) – Maximum threshold in beam score from leading beam. Any newly created candidate beams which lag behind the best beam so far by more than this value will get pruned. This is a performance optimization parameter and an appropriate value should be found empirically using a validation set.
cutoff_top_n (int) – Maximum number of tokens to expand per time step during decoding. Only the highest probability cutoff_top_n candidates (characters, sub-word units, words) in a given timestep will be expanded. This is a performance optimization parameter and an appropriate value should be found empirically using a validation set.
silence_score (float) – Score to add to beam when encountering a predicted silence token (eg. the space symbol).
merge_with_log_add (bool) – Whether to use log-add when merging scores of new candidate beams equivalent to existing ones (leading to the same transcription). When disabled, the maximum score is used.
criterion_type (FlashlightDecoderState.CriterionType) – Criterion used for training the acoustic model.
transitions (list[float]) – Transition score matrix for ASG acoustic models.
num_results (int) – Number of beams to return.
- Returns
List of FlashlightOutput structures.
- Return type
list[FlashlightOutput]
-
flashlight_beam_search_decoder_batch
(probs_seq, seq_lengths, alphabet, beam_size, decoder_type, token_type, lm_tokens, num_processes, scorer=None, beam_threshold=25.0, cutoff_top_n=40, silence_score=0.0, merge_with_log_add=False, criterion_type=native_client.ctcdecode.swigwrapper.FlashlightDecoderState.CTC, transitions=[], num_results=1)[source]¶ Decode batch acoustic model emissions in parallel.
num_processes
controls how many samples from the batch will be decoded simultaneously. All the other parameters are forwarded toflashlight_beam_search_decoder()
.Returns a list of lists of FlashlightOutput structures.
-
class
UTF8Alphabet
(*args: Any, **kwargs: Any)[source]¶ Alphabet class representing 255 possible byte values for Bytes Output Mode. For internal use only.
-
CanEncodeSingle
(input)[source]¶ Returns true if the single character/output class has a corresponding label in the alphabet.
-
CanEncode
(input)[source]¶ Returns true if the entire string can be encoded into labels in this alphabet.
-
EncodeSingle
(input)[source]¶ Encode a single character/output class into a label. Character must be in the alphabet, this method will assert that. Use
CanEncodeSingle
to test.
-