.NET Framework¶
STT Class¶
-
class
STTClient.STT: public STTClient.Interfaces.ISTT¶ Concrete implementation of STTClient.Interfaces.ISTT.
Public Functions
-
inline
STT(string aModelPath)¶ Initializes a new instance of STT class and creates a new acoustic model.
- Parameters
aModelPath: The path to the frozen model graph.
- Exceptions
ArgumentException: Thrown when the native binary failed to create the model.
-
inline unsafe uint GetModelBeamWidth () Get beam width value used by the model. If SetModelBeamWidth was not called before, will return the default value loaded from the model file.
- Return
Beam width value used by the model.
-
inline unsafe void SetModelBeamWidth (uint aBeamWidth) Set beam width value used by the model.
- Parameters
aBeamWidth: The beam width used by the decoder. A larger beam width value generates better results at the cost of decoding time.
- Exceptions
ArgumentException: Thrown on failure.
-
inline unsafe void AddHotWord (string aWord, float aBoost) Add a hot-word.
Words that don’t occur in the scorer (e.g. proper nouns) or strings that contain spaces won’t be taken into account.
- Parameters
aWord: Some wordaBoost: Some boost. Positive value increases and negative reduces chance of a word occuring in a transcription. Excessive positive boost might lead to splitting up of letters of the word following the hot-word.
- Exceptions
ArgumentException: Thrown on failure.
-
inline unsafe void EraseHotWord (string aWord) Erase entry for a hot-word.
- Parameters
aWord: Some word
- Exceptions
ArgumentException: Thrown on failure.
-
inline unsafe void ClearHotWords () Clear all hot-words.
- Exceptions
ArgumentException: Thrown on failure.
-
inline unsafe int GetModelSampleRate () Return the sample rate expected by the model.
- Return
Sample rate.
-
inline unsafe void Dispose () Frees associated resources and destroys models objects.
-
inline unsafe void EnableExternalScorer (string aScorerPath) Enable decoding using an external scorer.
- Parameters
aScorerPath: The path to the external scorer file.
- Exceptions
ArgumentException: Thrown when the native binary failed to enable decoding with an external scorer.FileNotFoundException: Thrown when cannot find the scorer file.
-
inline unsafe void DisableExternalScorer () Disable decoding using an external scorer.
- Exceptions
ArgumentException: Thrown when an external scorer is not enabled.
-
inline unsafe void SetScorerAlphaBeta (float aAlpha, float aBeta) Set hyperparameters alpha and beta of the external scorer.
- Parameters
aAlpha: The alpha hyperparameter of the decoder. Language model weight.aBeta: The beta hyperparameter of the decoder. Word insertion weight.
- Exceptions
ArgumentException: Thrown when an external scorer is not enabled.
-
inline unsafe void FeedAudioContent (STTStream stream, short[] aBuffer, uint aBufferSize) Feeds audio samples to an ongoing streaming inference.
- Parameters
stream: Instance of the stream to feed the data.aBuffer: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).
-
inline unsafe string FinishStream (STTStream stream) Closes the ongoing streaming inference, returns the STT result over the whole audio signal.
- Return
The STT result.
- Parameters
stream: Instance of the stream to finish.
-
inline unsafe Metadata FinishStreamWithMetadata (STTStream stream, uint aNumResults) Closes the ongoing streaming inference, returns the STT result over the whole audio signal, including metadata.
- Return
The extended metadata result.
- Parameters
stream: Instance of the stream to finish.aNumResults: Maximum number of candidate transcripts to return. Returned list might be smaller than this.
-
inline unsafe string IntermediateDecode (STTStream stream) Computes the intermediate decoding of an ongoing streaming inference.
- Return
The STT intermediate result.
- Parameters
stream: Instance of the stream to decode.
-
inline unsafe Metadata IntermediateDecodeWithMetadata (STTStream stream, uint aNumResults) Computes the intermediate decoding of an ongoing streaming inference, including metadata.
- Return
The STT intermediate result.
- Parameters
stream: Instance of the stream to decode.aNumResults: Maximum number of candidate transcripts to return. Returned list might be smaller than this.
-
inline unsafe string Version () Return version of this library. The returned version is a semantic version (SemVer 2.0.0).
-
inline unsafe STTStream CreateStream () Creates a new streaming inference state.
-
inline unsafe void FreeStream (STTStream stream) Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.
-
inline unsafe string SpeechToText (short[] aBuffer, uint aBufferSize) Use the STT model to perform Speech-To-Text.
- Return
The STT result. Returns NULL on error.
- Parameters
aBuffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).aBufferSize: The number of samples in the audio signal.
-
inline unsafe Metadata SpeechToTextWithMetadata (short[] aBuffer, uint aBufferSize, uint aNumResults) Use the STT model to perform Speech-To-Text, return results including metadata.
- Return
The extended metadata. Returns NULL on error.
- Parameters
aBuffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).aBufferSize: The number of samples in the audio signal.aNumResults: Maximum number of candidate transcripts to return. Returned list might be smaller than this.
-
inline
STTStream Class¶
-
class
STTClient.Models.STTStream: public IDisposable¶ Wrapper of the pointer used for the decoding stream.
Public Functions
-
inline unsafe STTStream (IntPtr **streamingStatePP) Initializes a new instance of STTStream.
- Parameters
streamingStatePP: Native pointer of the native stream.
-
ErrorCodes¶
See also the main definition including descriptions for each error in Error codes.
-
enum
STTClient.Enums.ErrorCodes¶ Error codes from the native Coqui STT binary.
Values:
-
STT_ERR_OK¶
-
STT_ERR_NO_MODEL¶
-
STT_ERR_INVALID_ALPHABET¶
-
STT_ERR_INVALID_SHAPE¶
-
STT_ERR_INVALID_SCORER¶
-
STT_ERR_MODEL_INCOMPATIBLE¶
-
STT_ERR_SCORER_NOT_ENABLED¶
-
STT_ERR_FAIL_INIT_MMAP¶
-
STT_ERR_FAIL_INIT_SESS¶
-
STT_ERR_FAIL_INTERPRETER¶
-
STT_ERR_FAIL_RUN_SESS¶
-
STT_ERR_FAIL_CREATE_STREAM¶
-
STT_ERR_FAIL_READ_PROTOBUF¶
-
STT_ERR_FAIL_CREATE_SESS¶
-
STT_ERR_FAIL_INSERT_HOTWORD¶
-
STT_ERR_FAIL_CLEAR_HOTWORD¶
-
STT_ERR_FAIL_ERASE_HOTWORD¶
-
Metadata¶
-
class
STTClient.Models.Metadata¶ Stores the entire CTC output as an array of character metadata objects.
Properties
-
CandidateTranscript [] Transcripts List of candidate transcripts.
-
CandidateTranscript¶
-
class
STTClient.Models.CandidateTranscript¶ Stores the entire CTC output as an array of character metadata objects.
Properties
-
double Confidence Approximated confidence value for this transcription.
-
TokenMetadata [] Tokens List of metada tokens containing text, timestep, and time offset.
-
TokenMetadata¶
-
class
STTClient.Models.TokenMetadata¶ Stores each individual character, along with its timing information.
STT Interface¶
-
interface
STTClient.Interfaces.ISTT: public IDisposable¶ Client interface for Coqui STT
Subclassed by STTClient.STT
Public Functions
-
unsafe string
Version()¶ Return version of this library. The returned version is a semantic version (SemVer 2.0.0).
-
unsafe int
GetModelSampleRate()¶ Return the sample rate expected by the model.
- Return
Sample rate.
-
unsafe uint
GetModelBeamWidth()¶ Get beam width value used by the model. If SetModelBeamWidth was not called before, will return the default value loaded from the model file.
- Return
Beam width value used by the model.
-
unsafe void
SetModelBeamWidth(uint aBeamWidth)¶ Set beam width value used by the model.
- Parameters
aBeamWidth: The beam width used by the decoder. A larger beam width value generates better results at the cost of decoding time.
- Exceptions
ArgumentException: Thrown on failure.
-
unsafe void
EnableExternalScorer(string aScorerPath)¶ Enable decoding using an external scorer.
- Parameters
aScorerPath: The path to the external scorer file.
- Exceptions
ArgumentException: Thrown when the native binary failed to enable decoding with an external scorer.FileNotFoundException: Thrown when cannot find the scorer file.
-
unsafe void
AddHotWord(string aWord, float aBoost)¶ Add a hot-word.
- Parameters
aWord: Some wordaBoost: Some boost
- Exceptions
ArgumentException: Thrown on failure.
-
unsafe void
EraseHotWord(string aWord)¶ Erase entry for a hot-word.
- Parameters
aWord: Some word
- Exceptions
ArgumentException: Thrown on failure.
-
unsafe void
ClearHotWords()¶ Clear all hot-words.
- Exceptions
ArgumentException: Thrown on failure.
-
unsafe void
DisableExternalScorer()¶ Disable decoding using an external scorer.
- Exceptions
ArgumentException: Thrown when an external scorer is not enabled.
-
unsafe void
SetScorerAlphaBeta(float aAlpha, float aBeta)¶ Set hyperparameters alpha and beta of the external scorer.
- Parameters
aAlpha: The alpha hyperparameter of the decoder. Language model weight.aBeta: The beta hyperparameter of the decoder. Word insertion weight.
- Exceptions
ArgumentException: Thrown when an external scorer is not enabled.
-
unsafe string
SpeechToText(short[] aBuffer, uint aBufferSize)¶ Use the STT model to perform Speech-To-Text.
- Return
The STT result. Returns NULL on error.
- Parameters
aBuffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).aBufferSize: The number of samples in the audio signal.
-
unsafe Metadata
SpeechToTextWithMetadata(short[] aBuffer, uint aBufferSize, uint aNumResults)¶ Use the STT model to perform Speech-To-Text, return results including metadata.
- Return
The extended metadata. Returns NULL on error.
- Parameters
aBuffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).aBufferSize: The number of samples in the audio signal.aNumResults: Maximum number of candidate transcripts to return. Returned list might be smaller than this.
-
unsafe void
FreeStream(STTStream stream)¶ Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.
-
unsafe void
FeedAudioContent(STTStream stream, short[] aBuffer, uint aBufferSize)¶ Feeds audio samples to an ongoing streaming inference.
- Parameters
stream: Instance of the stream to feed the data.aBuffer: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).
-
unsafe string
IntermediateDecode(STTStream stream)¶ Computes the intermediate decoding of an ongoing streaming inference.
- Return
The STT intermediate result.
- Parameters
stream: Instance of the stream to decode.
-
unsafe Metadata
IntermediateDecodeWithMetadata(STTStream stream, uint aNumResults)¶ Computes the intermediate decoding of an ongoing streaming inference, including metadata.
- Return
The extended metadata result.
- Parameters
stream: Instance of the stream to decode.aNumResults: Maximum number of candidate transcripts to return. Returned list might be smaller than this.
-
unsafe string
FinishStream(STTStream stream)¶ Closes the ongoing streaming inference, returns the STT result over the whole audio signal.
- Return
The STT result.
- Parameters
stream: Instance of the stream to finish.
-
unsafe Metadata
FinishStreamWithMetadata(STTStream stream, uint aNumResults)¶ Closes the ongoing streaming inference, returns the STT result over the whole audio signal, including metadata.
- Return
The extended metadata result.
- Parameters
stream: Instance of the stream to finish.aNumResults: Maximum number of candidate transcripts to return. Returned list might be smaller than this.
-
unsafe string