C API¶
See also the list of error codes including descriptions for each error in Error codes.
-
int
STT_CreateModel
(const char *aModelPath, ModelState **retval)¶ An object providing an interface to a trained Coqui STT model.
- Return
Zero on success, non-zero on failure.
- Parameters
aModelPath
: The path to the frozen model graph.[out] retval
: a ModelState pointer
-
void
STT_FreeModel
(ModelState *ctx)¶ Frees associated resources and destroys model object.
-
int
STT_EnableExternalScorer
(ModelState *aCtx, const char *aScorerPath)¶ Enable decoding using an external scorer.
- Return
Zero on success, non-zero on failure (invalid arguments).
- Parameters
aCtx
: The ModelState pointer for the model being changed.aScorerPath
: The path to the external scorer file.
-
int
STT_DisableExternalScorer
(ModelState *aCtx)¶ Disable decoding using an external scorer.
- Return
Zero on success, non-zero on failure.
- Parameters
aCtx
: The ModelState pointer for the model being changed.
-
int
STT_AddHotWord
(ModelState *aCtx, const char *word, float boost)¶ Add a hot-word and its boost.
Words that don’t occur in the scorer (e.g. proper nouns) or strings that contain spaces won’t be taken into account.
- Return
Zero on success, non-zero on failure (invalid arguments).
- Parameters
aCtx
: The ModelState pointer for the model being changed.word
: The hot-word.boost
: The boost. Positive value increases and negative reduces chance of a word occuring in a transcription. Excessive positive boost might lead to splitting up of letters of the word following the hot-word.
-
int
STT_EraseHotWord
(ModelState *aCtx, const char *word)¶ Remove entry for a hot-word from the hot-words map.
- Return
Zero on success, non-zero on failure (invalid arguments).
- Parameters
aCtx
: The ModelState pointer for the model being changed.word
: The hot-word.
-
int
STT_ClearHotWords
(ModelState *aCtx)¶ Removes all elements from the hot-words map.
- Return
Zero on success, non-zero on failure (invalid arguments).
- Parameters
aCtx
: The ModelState pointer for the model being changed.
-
int
STT_SetScorerAlphaBeta
(ModelState *aCtx, float aAlpha, float aBeta)¶ Set hyperparameters alpha and beta of the external scorer.
- Return
Zero on success, non-zero on failure.
- Parameters
aCtx
: The ModelState pointer for the model being changed.aAlpha
: The alpha hyperparameter of the decoder. Language model weight.aLMBeta
: The beta hyperparameter of the decoder. Word insertion weight.
-
int
STT_GetModelSampleRate
(const ModelState *aCtx)¶ Return the sample rate expected by a model.
- Return
Sample rate expected by the model for its input.
- Parameters
aCtx
: A ModelState pointer created with STT_CreateModel.
-
char *
STT_SpeechToText
(ModelState *aCtx, const short *aBuffer, unsigned int aBufferSize)¶ Use the Coqui STT model to convert speech to text.
- Return
The STT result. The user is responsible for freeing the string using STT_FreeString(). Returns NULL on error.
- Parameters
aCtx
: The ModelState pointer for the model to use.aBuffer
: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).aBufferSize
: The number of samples in the audio signal.
-
Metadata *
STT_SpeechToTextWithMetadata
(ModelState *aCtx, const short *aBuffer, unsigned int aBufferSize, unsigned int aNumResults)¶ Use the Coqui STT model to convert speech to text and output results including metadata.
- Return
Metadata struct containing multiple CandidateTranscript structs. Each transcript has per-token metadata including timing information. The user is responsible for freeing Metadata by calling STT_FreeMetadata(). Returns NULL on error.
- Parameters
aCtx
: The ModelState pointer for the model to use.aBuffer
: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).aBufferSize
: The number of samples in the audio signal.aNumResults
: The maximum number of CandidateTranscript structs to return. Returned value might be smaller than this.
-
int
STT_CreateStream
(ModelState *aCtx, StreamingState **retval)¶ Create a new streaming inference state. The streaming state returned by this function can then be passed to STT_FeedAudioContent() and STT_FinishStream().
- Return
Zero for success, non-zero on failure.
- Parameters
aCtx
: The ModelState pointer for the model to use.
-
void
STT_FeedAudioContent
(StreamingState *aSctx, const short *aBuffer, unsigned int aBufferSize)¶ Feed audio samples to an ongoing streaming inference.
- Parameters
aSctx
: A streaming state pointer returned by STT_CreateStream().aBuffer
: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).aBufferSize
: The number of samples inaBuffer
.
-
char *
STT_IntermediateDecode
(const StreamingState *aSctx)¶ Compute the intermediate decoding of an ongoing streaming inference.
- Return
The STT intermediate result. The user is responsible for freeing the string using STT_FreeString().
- Parameters
aSctx
: A streaming state pointer returned by STT_CreateStream().
-
Metadata *
STT_IntermediateDecodeWithMetadata
(const StreamingState *aSctx, unsigned int aNumResults)¶ Compute the intermediate decoding of an ongoing streaming inference, return results including metadata.
- Return
Metadata struct containing multiple candidate transcripts. Each transcript has per-token metadata including timing information. The user is responsible for freeing Metadata by calling STT_FreeMetadata(). Returns NULL on error.
- Parameters
aSctx
: A streaming state pointer returned by STT_CreateStream().aNumResults
: The number of candidate transcripts to return.
-
char *
STT_IntermediateDecodeFlushBuffers
(StreamingState *aSctx)¶ EXPERIMENTAL: Compute the intermediate decoding of an ongoing streaming inference, flushing buffers first. This ensures that all audio that has been streamed so far is included in the result, but is more expensive than STT_IntermediateDecode() because buffers are processed through the acoustic model. Calling this function too often will also degrade transcription accuracy due to trashing of the LSTM hidden state vectors.
- Return
The STT result. The user is responsible for freeing the string using STT_FreeString().
- Note
This method will free the state pointer (
aSctx
).- Parameters
aSctx
: A streaming state pointer returned by STT_CreateStream().
-
Metadata *
STT_IntermediateDecodeWithMetadataFlushBuffers
(StreamingState *aSctx, unsigned int aNumResults)¶ EXPERIMENTAL: Compute the intermediate decoding of an ongoing streaming inference, flushing buffers first. This ensures that all audio that has been streamed so far is included in the result, but is more expensive than STT_IntermediateDecodeWithMetadata() because buffers are processed through the acoustic model. Calling this function too often will also degrade transcription accuracy due to trashing of the LSTM hidden state vectors. Returns results including metadata.
- Return
Metadata struct containing multiple candidate transcripts. Each transcript has per-token metadata including timing information. The user is responsible for freeing Metadata by calling STT_FreeMetadata(). Returns NULL on error.
- Parameters
aSctx
: A streaming state pointer returned by STT_CreateStream().aNumResults
: The number of candidate transcripts to return.
-
char *
STT_FinishStream
(StreamingState *aSctx)¶ Compute the final decoding of an ongoing streaming inference and return the result. Signals the end of an ongoing streaming inference.
- Return
The STT result. The user is responsible for freeing the string using STT_FreeString().
- Note
This method will free the state pointer (
aSctx
).- Parameters
aSctx
: A streaming state pointer returned by STT_CreateStream().
-
Metadata *
STT_FinishStreamWithMetadata
(StreamingState *aSctx, unsigned int aNumResults)¶ Compute the final decoding of an ongoing streaming inference and return results including metadata. Signals the end of an ongoing streaming inference.
- Return
Metadata struct containing multiple candidate transcripts. Each transcript has per-token metadata including timing information. The user is responsible for freeing Metadata by calling STT_FreeMetadata(). Returns NULL on error.
- Note
This method will free the state pointer (
aSctx
).- Parameters
aSctx
: A streaming state pointer returned by STT_CreateStream().aNumResults
: The number of candidate transcripts to return.
-
void
STT_FreeStream
(StreamingState *aSctx)¶ Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.
- Note
This method will free the state pointer (
aSctx
).- Parameters
aSctx
: A streaming state pointer returned by STT_CreateStream().
-
void
STT_FreeString
(char *str)¶ Free a char* string returned by the Coqui STT API.
-
char *
STT_Version
()¶ Returns the version of this library. The returned version is a semantic version (SemVer 2.0.0). The string returned must be freed with STT_FreeString().
- Return
The version string.