WASM)¶

Model¶

class Model(aModelData, loadFromBytes=false)¶

An object providing an interface to a trained Coqui STT model.

Arguments

aModelData (string|Uint8Array) – Either the path to the frozen model graph or the frozen model’s bytes.
loadFromBytes (boolean) – Wheter to load the model from bytes or from a file.

Model.addHotWord(aWord, aBoost)¶

Add a hot-word and its boost.

Words that don’t occur in the scorer (e.g. proper nouns) or strings that contain spaces won’t be taken into account.

Arguments

aWord (string) – word
aBoost (number) – boost Positive value increases and negative reduces chance of a word occuring in a transcription. Excessive positive boost might lead to splitting up of letters of the word following the hot-word.

Model.beamWidth()¶

Get beam width value used by the model. If Model.setBeamWidth() was not called before, will return the default value loaded from the model file.

Returns: number – Beam width value used by the model.

Model.clearHotWords()¶: Clear all hot-word entries

Model.createStream()¶

Create a new streaming inference state. One can then call StreamImpl.feedAudioContent() and StreamImpl.finishStream() on the returned stream object.

Returns: StreamImpl – a StreamImpl() object that represents the streaming state.

Model.disableExternalScorer()¶: Disable decoding using an external scorer.

Model.enableExternalScorer(aScorerPath)¶

Enable decoding using an external scorer.

Arguments

aScorerPath (string) – The path to the external scorer file.

Model.eraseHotWord(aWord)¶

Erase entry for hot-word

Arguments

aWord (string) – word

Model.sampleRate()¶

Return the sample rate expected by the model.

Returns: number – Sample rate.

Model.setBeamWidth(aBeamWidth)¶

Set beam width value used by the model.

Arguments

aBeamWidth (number) – The beam width used by the model. A larger beam width value generates better results at the cost of decoding time.

Model.setScorerAlphaBeta(aLMAlpha, aLMBeta)¶

Set hyperparameters alpha and beta of the external scorer.

Arguments

aLMAlpha (number) – The alpha hyperparameter of the CTC decoder. Language Model weight.
aLMBeta (number) – The beta hyperparameter of the CTC decoder. Word insertion weight.

Model.stt(aBuffer)¶

Use the Coqui STT model to perform Speech-To-Text.

Arguments

aBuffer (Buffer) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).

Returns

string – The STT result. Returns undefined on error.

Model.sttWithMetadata(aBuffer, aNumResults=1)¶

Use the Coqui STT model to perform Speech-To-Text and output metadata about the results.

Arguments

aBuffer (Buffer) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
aNumResults (number) – Maximum number of candidate transcripts to return. Returned list might be smaller than this. Default value is 1 if not specified.

Returns

Metadata – Metadata() object containing multiple candidate transcripts. Each transcript has per-token metadata including timing information. The user is responsible for freeing Metadata by calling FreeMetadata(). Returns undefined on error.

Stream¶

class StreamImpl(nativeStream)¶

Provides an interface to a Coqui STT stream. The constructor cannot be called directly, use Model.createStream().

Arguments

nativeStream (object) – SWIG wrapper for native StreamingState object.

StreamImpl.feedAudioContent(aBuffer)¶

Feed audio samples to an ongoing streaming inference.

Arguments

aBuffer (Buffer) – An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).

StreamImpl.finishStream()¶

Compute the final decoding of an ongoing streaming inference and return the result. Signals the end of an ongoing streaming inference.

Returns: string – The STT result. This method will free the stream, it must not be used after this method is called.

StreamImpl.finishStreamWithMetadata(aNumResults=1)¶

Compute the final decoding of an ongoing streaming inference and return the results including metadata. Signals the end of an ongoing streaming inference.

Arguments

aNumResults (number) – Maximum number of candidate transcripts to return. Returned list might be smaller than this. Default value is 1 if not specified.

Returns

Metadata – Outputs a Metadata() struct of individual letters along with their timing information. The user is responsible for freeing Metadata by calling FreeMetadata(). This method will free the stream, it must not be used after this method is called.

StreamImpl.intermediateDecode()¶

Compute the intermediate decoding of an ongoing streaming inference.

Returns: string – The STT intermediate result.

StreamImpl.intermediateDecodeFlushBuffers()¶

EXPERIMENTAL: Compute the intermediate decoding of an ongoing streaming inference, flushing buffers first. This ensures that all audio that has been streamed so far is included in the result, but is more expensive than intermediateDecode() because buffers are processed through the acoustic model.

Returns: string – The STT intermediate result.

StreamImpl.intermediateDecodeWithMetadata(aNumResults=1)¶

Compute the intermediate decoding of an ongoing streaming inference, return results including metadata.

Arguments

aNumResults (number) – Maximum number of candidate transcripts to return. Returned list might be smaller than this. Default value is 1 if not specified.

Returns

StreamImpl.intermediateDecodeWithMetadataFlushBuffers(aNumResults=1)¶

EXPERIMENTAL: Compute the intermediate decoding of an ongoing streaming inference, flushing buffers first. This ensures that all audio that has been streamed so far is included in the result, but is more expensive than intermediateDecodeWithMetadata() because buffers are processed through the acoustic model. Returns results including metadata.

Arguments

aNumResults (number) – Maximum number of candidate transcripts to return. Returned list might be smaller than this. Default value is 1 if not specified.

Returns

Module exported methods¶

FreeModel(model)¶

Frees associated resources and destroys model object.

Arguments

model (Model) – A model pointer returned by Model()

FreeStream(stream)¶

Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.

Arguments

stream (StreamImpl) – A streaming state pointer returned by Model.createStream().

FreeMetadata(metadata)¶

Free memory allocated for metadata information.

Arguments

metadata (Metadata) – Object containing metadata as returned by Model.sttWithMetadata() or StreamImpl.finishStreamWithMetadata()

Version()¶

Returns the version of this library. The returned version is a semantic version (SemVer 2.0.0).

Returns: string –

Metadata¶

class Metadata()¶

An array of CandidateTranscript objects computed by the model.

interface

Metadata.transcripts¶: type: CandidateTranscript[]

CandidateTranscript¶

class CandidateTranscript()¶

A single transcript computed by the model, including a confidence value and the metadata for its constituent tokens.

interface

CandidateTranscript.confidence¶

type: number

Approximated confidence value for this transcription. This is roughly the sum of the acoustic model logit values for each timestep/token that contributed to the creation of this transcription.

CandidateTranscript.tokens¶: type: TokenMetadata[]

TokenMetadata¶

class TokenMetadata()¶

Stores text of an individual token, along with its timing information

interface

TokenMetadata.start_time¶

type: number

Position of the token in seconds

TokenMetadata.text¶

type: string

The text corresponding to this token

TokenMetadata.timestep¶

type: number

Position of the token in units of 20ms