.NET Framework

STT Class

class STTClient.STT : public STTClient.Interfaces.ISTT

Concrete implementation of STTClient.Interfaces.ISTT.

Public Functions

inline STT (string aModelPath)

Initializes a new instance of STT class and creates a new acoustic model.

Parameters
  • aModelPath: The path to the frozen model graph.

Exceptions
  • ArgumentException: Thrown when the native binary failed to create the model.

inline unsafe uint GetModelBeamWidth ()

Get beam width value used by the model. If SetModelBeamWidth was not called before, will return the default value loaded from the model file.

Return

Beam width value used by the model.

inline unsafe void SetModelBeamWidth (uint aBeamWidth)

Set beam width value used by the model.

Parameters
  • aBeamWidth: The beam width used by the decoder. A larger beam width value generates better results at the cost of decoding time.

Exceptions
  • ArgumentException: Thrown on failure.

inline unsafe void AddHotWord (string aWord, float aBoost)

Add a hot-word.

Words that don’t occur in the scorer (e.g. proper nouns) or strings that contain spaces won’t be taken into account.

Parameters
  • aWord: Some word

  • aBoost: Some boost. Positive value increases and negative reduces chance of a word occuring in a transcription. Excessive positive boost might lead to splitting up of letters of the word following the hot-word.

Exceptions
  • ArgumentException: Thrown on failure.

inline unsafe void EraseHotWord (string aWord)

Erase entry for a hot-word.

Parameters
  • aWord: Some word

Exceptions
  • ArgumentException: Thrown on failure.

inline unsafe void ClearHotWords ()

Clear all hot-words.

Exceptions
  • ArgumentException: Thrown on failure.

inline unsafe int GetModelSampleRate ()

Return the sample rate expected by the model.

Return

Sample rate.

inline unsafe void Dispose ()

Frees associated resources and destroys models objects.

inline unsafe void EnableExternalScorer (string aScorerPath)

Enable decoding using an external scorer.

Parameters
  • aScorerPath: The path to the external scorer file.

Exceptions
  • ArgumentException: Thrown when the native binary failed to enable decoding with an external scorer.

  • FileNotFoundException: Thrown when cannot find the scorer file.

inline unsafe void DisableExternalScorer ()

Disable decoding using an external scorer.

Exceptions
  • ArgumentException: Thrown when an external scorer is not enabled.

inline unsafe void SetScorerAlphaBeta (float aAlpha, float aBeta)

Set hyperparameters alpha and beta of the external scorer.

Parameters
  • aAlpha: The alpha hyperparameter of the decoder. Language model weight.

  • aBeta: The beta hyperparameter of the decoder. Word insertion weight.

Exceptions
  • ArgumentException: Thrown when an external scorer is not enabled.

inline unsafe void FeedAudioContent (STTStream stream, short[] aBuffer, uint aBufferSize)

Feeds audio samples to an ongoing streaming inference.

Parameters
  • stream: Instance of the stream to feed the data.

  • aBuffer: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).

inline unsafe string FinishStream (STTStream stream)

Closes the ongoing streaming inference, returns the STT result over the whole audio signal.

Return

The STT result.

Parameters
  • stream: Instance of the stream to finish.

inline unsafe Metadata FinishStreamWithMetadata (STTStream stream, uint aNumResults)

Closes the ongoing streaming inference, returns the STT result over the whole audio signal, including metadata.

Return

The extended metadata result.

Parameters
  • stream: Instance of the stream to finish.

  • aNumResults: Maximum number of candidate transcripts to return. Returned list might be smaller than this.

inline unsafe string IntermediateDecode (STTStream stream)

Computes the intermediate decoding of an ongoing streaming inference.

Return

The STT intermediate result.

Parameters
  • stream: Instance of the stream to decode.

inline unsafe Metadata IntermediateDecodeWithMetadata (STTStream stream, uint aNumResults)

Computes the intermediate decoding of an ongoing streaming inference, including metadata.

Return

The STT intermediate result.

Parameters
  • stream: Instance of the stream to decode.

  • aNumResults: Maximum number of candidate transcripts to return. Returned list might be smaller than this.

inline unsafe string Version ()

Return version of this library. The returned version is a semantic version (SemVer 2.0.0).

inline unsafe STTStream CreateStream ()

Creates a new streaming inference state.

inline unsafe void FreeStream (STTStream stream)

Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.

inline unsafe string SpeechToText (short[] aBuffer, uint aBufferSize)

Use the STT model to perform Speech-To-Text.

Return

The STT result. Returns NULL on error.

Parameters
  • aBuffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).

  • aBufferSize: The number of samples in the audio signal.

inline unsafe Metadata SpeechToTextWithMetadata (short[] aBuffer, uint aBufferSize, uint aNumResults)

Use the STT model to perform Speech-To-Text, return results including metadata.

Return

The extended metadata. Returns NULL on error.

Parameters
  • aBuffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).

  • aBufferSize: The number of samples in the audio signal.

  • aNumResults: Maximum number of candidate transcripts to return. Returned list might be smaller than this.

STTStream Class

class STTClient.Models.STTStream : public IDisposable

Wrapper of the pointer used for the decoding stream.

Public Functions

inline unsafe STTStream (IntPtr **streamingStatePP)

Initializes a new instance of STTStream.

Parameters
  • streamingStatePP: Native pointer of the native stream.

ErrorCodes

See also the main definition including descriptions for each error in Error codes.

enum STTClient.Enums.ErrorCodes

Error codes from the native Coqui STT binary.

Values:

STT_ERR_OK
STT_ERR_NO_MODEL
STT_ERR_INVALID_ALPHABET
STT_ERR_INVALID_SHAPE
STT_ERR_INVALID_SCORER
STT_ERR_MODEL_INCOMPATIBLE
STT_ERR_SCORER_NOT_ENABLED
STT_ERR_FAIL_INIT_MMAP
STT_ERR_FAIL_INIT_SESS
STT_ERR_FAIL_INTERPRETER
STT_ERR_FAIL_RUN_SESS
STT_ERR_FAIL_CREATE_STREAM
STT_ERR_FAIL_READ_PROTOBUF
STT_ERR_FAIL_CREATE_SESS
STT_ERR_FAIL_INSERT_HOTWORD
STT_ERR_FAIL_CLEAR_HOTWORD
STT_ERR_FAIL_ERASE_HOTWORD

Metadata

class STTClient.Models.Metadata

Stores the entire CTC output as an array of character metadata objects.

Properties

CandidateTranscript [] Transcripts

List of candidate transcripts.

CandidateTranscript

class STTClient.Models.CandidateTranscript

Stores the entire CTC output as an array of character metadata objects.

Properties

double Confidence

Approximated confidence value for this transcription.

TokenMetadata [] Tokens

List of metada tokens containing text, timestep, and time offset.

TokenMetadata

class STTClient.Models.TokenMetadata

Stores each individual character, along with its timing information.

Public Members

string Text

Char of the current timestep.

int Timestep

Position of the character in units of 20ms.

float StartTime

Position of the character in seconds.

STT Interface

interface STTClient.Interfaces.ISTT : public IDisposable

Client interface for Coqui STT

Subclassed by STTClient.STT

Public Functions

unsafe string Version ()

Return version of this library. The returned version is a semantic version (SemVer 2.0.0).

unsafe int GetModelSampleRate ()

Return the sample rate expected by the model.

Return

Sample rate.

unsafe uint GetModelBeamWidth ()

Get beam width value used by the model. If SetModelBeamWidth was not called before, will return the default value loaded from the model file.

Return

Beam width value used by the model.

unsafe void SetModelBeamWidth (uint aBeamWidth)

Set beam width value used by the model.

Parameters
  • aBeamWidth: The beam width used by the decoder. A larger beam width value generates better results at the cost of decoding time.

Exceptions
  • ArgumentException: Thrown on failure.

unsafe void EnableExternalScorer (string aScorerPath)

Enable decoding using an external scorer.

Parameters
  • aScorerPath: The path to the external scorer file.

Exceptions
  • ArgumentException: Thrown when the native binary failed to enable decoding with an external scorer.

  • FileNotFoundException: Thrown when cannot find the scorer file.

unsafe void AddHotWord (string aWord, float aBoost)

Add a hot-word.

Parameters
  • aWord: Some word

  • aBoost: Some boost

Exceptions
  • ArgumentException: Thrown on failure.

unsafe void EraseHotWord (string aWord)

Erase entry for a hot-word.

Parameters
  • aWord: Some word

Exceptions
  • ArgumentException: Thrown on failure.

unsafe void ClearHotWords ()

Clear all hot-words.

Exceptions
  • ArgumentException: Thrown on failure.

unsafe void DisableExternalScorer ()

Disable decoding using an external scorer.

Exceptions
  • ArgumentException: Thrown when an external scorer is not enabled.

unsafe void SetScorerAlphaBeta (float aAlpha, float aBeta)

Set hyperparameters alpha and beta of the external scorer.

Parameters
  • aAlpha: The alpha hyperparameter of the decoder. Language model weight.

  • aBeta: The beta hyperparameter of the decoder. Word insertion weight.

Exceptions
  • ArgumentException: Thrown when an external scorer is not enabled.

unsafe string SpeechToText (short[] aBuffer, uint aBufferSize)

Use the STT model to perform Speech-To-Text.

Return

The STT result. Returns NULL on error.

Parameters
  • aBuffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).

  • aBufferSize: The number of samples in the audio signal.

unsafe Metadata SpeechToTextWithMetadata (short[] aBuffer, uint aBufferSize, uint aNumResults)

Use the STT model to perform Speech-To-Text, return results including metadata.

Return

The extended metadata. Returns NULL on error.

Parameters
  • aBuffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).

  • aBufferSize: The number of samples in the audio signal.

  • aNumResults: Maximum number of candidate transcripts to return. Returned list might be smaller than this.

unsafe void FreeStream (STTStream stream)

Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.

unsafe STTStream CreateStream ()

Creates a new streaming inference state.

unsafe void FeedAudioContent (STTStream stream, short[] aBuffer, uint aBufferSize)

Feeds audio samples to an ongoing streaming inference.

Parameters
  • stream: Instance of the stream to feed the data.

  • aBuffer: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).

unsafe string IntermediateDecode (STTStream stream)

Computes the intermediate decoding of an ongoing streaming inference.

Return

The STT intermediate result.

Parameters
  • stream: Instance of the stream to decode.

unsafe Metadata IntermediateDecodeWithMetadata (STTStream stream, uint aNumResults)

Computes the intermediate decoding of an ongoing streaming inference, including metadata.

Return

The extended metadata result.

Parameters
  • stream: Instance of the stream to decode.

  • aNumResults: Maximum number of candidate transcripts to return. Returned list might be smaller than this.

unsafe string FinishStream (STTStream stream)

Closes the ongoing streaming inference, returns the STT result over the whole audio signal.

Return

The STT result.

Parameters
  • stream: Instance of the stream to finish.

unsafe Metadata FinishStreamWithMetadata (STTStream stream, uint aNumResults)

Closes the ongoing streaming inference, returns the STT result over the whole audio signal, including metadata.

Return

The extended metadata result.

Parameters
  • stream: Instance of the stream to finish.

  • aNumResults: Maximum number of candidate transcripts to return. Returned list might be smaller than this.