Java¶

STTModel¶

class ai::coqui::libstt::STTModel¶

Exposes a STT model in Java.

Public Functions

inline STTModel(String modelPath)¶

An object providing an interface to a trained STT model.

Parameters

modelPath: The path to the frozen model graph.

Exceptions

RuntimeException: on failure.

inline long beamWidth()¶

Get beam width value used by the model. If setModelBeamWidth was not called before, will return the default value loaded from the model file.

Return: Beam width value used by the model.

inline void setBeamWidth(long beamWidth)¶

Set beam width value used by the model.

Parameters

aBeamWidth: The beam width used by the model. A larger beam width value generates better results at the cost of decoding time.

Exceptions

RuntimeException: on failure.

inline int sampleRate()¶

Return the sample rate expected by the model.

Return: Sample rate.

inline void freeModel()¶: Frees associated resources and destroys model object.

inline void enableExternalScorer(String scorer)¶

Enable decoding using an external scorer.

Parameters

scorer: The path to the external scorer file.

Exceptions

RuntimeException: on failure.

inline void disableExternalScorer()¶

Disable decoding using an external scorer.

Exceptions

RuntimeException: on failure.

inline void setScorerAlphaBeta(float alpha, float beta)¶

Enable decoding using beam scoring with a KenLM language model.

Parameters

alpha: The alpha hyperparameter of the decoder. Language model weight.
beta: The beta hyperparameter of the decoder. Word insertion weight.

Exceptions

RuntimeException: on failure.

inline Metadata sttWithMetadata (short[] buffer, int buffer_size, int num_results)

Use the STT model to perform Speech-To-Text and output metadata about the results.

Return

Metadata struct containing multiple candidate transcripts. Each transcript has per-token metadata including timing information.

Parameters

buffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
buffer_size: The number of samples in the audio signal.
num_results: Maximum number of candidate transcripts to return. Returned list might be smaller than this.

inline STTStreamingState createStream()¶

Create a new streaming inference state. The streaming state returned by this function can then be passed to feedAudioContent() and finishStream().

Return

An opaque object that represents the streaming state.

Exceptions

RuntimeException: on failure.

inline void feedAudioContent (STTStreamingState ctx, short[] buffer, int buffer_size)

Feed audio samples to an ongoing streaming inference.

Parameters

cctx: A streaming state pointer returned by createStream().
buffer: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).
buffer_size: The number of samples in buffer.

inline String intermediateDecode(STTStreamingState ctx)¶

Compute the intermediate decoding of an ongoing streaming inference.

Return

The STT intermediate result.

Parameters

ctx: A streaming state pointer returned by createStream().

inline Metadata intermediateDecodeWithMetadata(STTStreamingState ctx, int num_results)¶

Compute the intermediate decoding of an ongoing streaming inference.

Return

The STT intermediate result.

Parameters

ctx: A streaming state pointer returned by createStream().
num_results: Maximum number of candidate transcripts to return. Returned list might be smaller than this.

inline String finishStream(STTStreamingState ctx)¶

Compute the final decoding of an ongoing streaming inference and return the result. Signals the end of an ongoing streaming inference.

Return

The STT result.

Note

This method will free the state pointer (ctx).

Parameters

ctx: A streaming state pointer returned by createStream().

inline Metadata finishStreamWithMetadata(STTStreamingState ctx, int num_results)¶

Compute the final decoding of an ongoing streaming inference and return the results including metadata. Signals the end of an ongoing streaming inference.

Return

Metadata struct containing multiple candidate transcripts. Each transcript has per-token metadata including timing information.

Note

This method will free the state pointer (ctx).

Parameters

ctx: A streaming state pointer returned by createStream().
num_results: Maximum number of candidate transcripts to return. Returned list might be smaller than this.

inline void addHotWord(String word, float boost)¶

Add a hot-word.

Words that don’t occur in the scorer (e.g. proper nouns) or strings that contain spaces won’t be taken into account.

Parameters

word:
boost: Positive value increases and negative reduces chance of a word occuring in a transcription. Excessive positive boost might lead to splitting up of letters of the word following the hot-word.

Exceptions

RuntimeException: on failure.

inline void eraseHotWord(String word)¶

Erase a hot-word.

Parameters

word:

Exceptions

RuntimeException: on failure.

inline void clearHotWords()¶

Clear all hot-words.

Exceptions

RuntimeException: on failure.

Metadata¶

class ai::coqui::libstt::Metadata¶

An array of CandidateTranscript objects computed by the model.

Public Functions

inline long getNumTranscripts()¶: Size of the transcripts array

inline CandidateTranscript getTranscript(int i)¶

Retrieve one CandidateTranscript element

Return

The CandidateTranscript requested or null

Parameters

i: Array index of the CandidateTranscript to get

CandidateTranscript¶

class ai::coqui::libstt::CandidateTranscript¶

A single transcript computed by the model, including a confidence

value and the metadata for its constituent tokens.

Public Functions

inline long getNumTokens()¶: Size of the tokens array

inline double getConfidence()¶

Approximated confidence value for this transcript. This is roughly the

sum of the acoustic model logit values for each timestep/character that

contributed to the creation of this transcript.

inline TokenMetadata getToken(int i)¶

Retrieve one TokenMetadata element

Return

The TokenMetadata requested or null

Parameters

i: Array index of the TokenMetadata to get

TokenMetadata¶

class ai::coqui::libstt::TokenMetadata¶

Stores text of an individual token, along with its timing information

Public Functions

inline String getText()¶: The text corresponding to this token

inline long getTimestep()¶: Position of the token in units of 20ms

inline float getStartTime()¶: Position of the token in seconds