Java¶
STTModel¶
-
class
ai::coqui::libstt
::
STTModel
¶ Exposes a STT model in Java.
Public Functions
-
inline
STTModel
(String modelPath)¶ An object providing an interface to a trained STT model.
- Parameters
modelPath
: The path to the frozen model graph.
- Exceptions
RuntimeException
: on failure.
-
inline long
beamWidth
()¶ Get beam width value used by the model. If setModelBeamWidth was not called before, will return the default value loaded from the model file.
- Return
Beam width value used by the model.
-
inline void
setBeamWidth
(long beamWidth)¶ Set beam width value used by the model.
- Parameters
aBeamWidth
: The beam width used by the model. A larger beam width value generates better results at the cost of decoding time.
- Exceptions
RuntimeException
: on failure.
-
inline int
sampleRate
()¶ Return the sample rate expected by the model.
- Return
Sample rate.
-
inline void
freeModel
()¶ Frees associated resources and destroys model object.
-
inline void
enableExternalScorer
(String scorer)¶ Enable decoding using an external scorer.
- Parameters
scorer
: The path to the external scorer file.
- Exceptions
RuntimeException
: on failure.
-
inline void
disableExternalScorer
()¶ Disable decoding using an external scorer.
- Exceptions
RuntimeException
: on failure.
-
inline void
setScorerAlphaBeta
(float alpha, float beta)¶ Enable decoding using beam scoring with a KenLM language model.
- Parameters
alpha
: The alpha hyperparameter of the decoder. Language model weight.beta
: The beta hyperparameter of the decoder. Word insertion weight.
- Exceptions
RuntimeException
: on failure.
-
inline Metadata sttWithMetadata (short[] buffer, int buffer_size, int num_results)
Use the STT model to perform Speech-To-Text and output metadata about the results.
- Return
Metadata struct containing multiple candidate transcripts. Each transcript has per-token metadata including timing information.
- Parameters
buffer
: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).buffer_size
: The number of samples in the audio signal.num_results
: Maximum number of candidate transcripts to return. Returned list might be smaller than this.
-
inline STTStreamingState
createStream
()¶ Create a new streaming inference state. The streaming state returned by this function can then be passed to feedAudioContent() and finishStream().
- Return
An opaque object that represents the streaming state.
- Exceptions
RuntimeException
: on failure.
-
inline void feedAudioContent (STTStreamingState ctx, short[] buffer, int buffer_size)
Feed audio samples to an ongoing streaming inference.
- Parameters
cctx
: A streaming state pointer returned by createStream().buffer
: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).buffer_size
: The number of samples inbuffer
.
-
inline String
intermediateDecode
(STTStreamingState ctx)¶ Compute the intermediate decoding of an ongoing streaming inference.
- Return
The STT intermediate result.
- Parameters
ctx
: A streaming state pointer returned by createStream().
-
inline Metadata
intermediateDecodeWithMetadata
(STTStreamingState ctx, int num_results)¶ Compute the intermediate decoding of an ongoing streaming inference.
- Return
The STT intermediate result.
- Parameters
ctx
: A streaming state pointer returned by createStream().num_results
: Maximum number of candidate transcripts to return. Returned list might be smaller than this.
-
inline String
finishStream
(STTStreamingState ctx)¶ Compute the final decoding of an ongoing streaming inference and return the result. Signals the end of an ongoing streaming inference.
- Return
The STT result.
- Note
This method will free the state pointer (
ctx
).- Parameters
ctx
: A streaming state pointer returned by createStream().
-
inline Metadata
finishStreamWithMetadata
(STTStreamingState ctx, int num_results)¶ Compute the final decoding of an ongoing streaming inference and return the results including metadata. Signals the end of an ongoing streaming inference.
- Return
Metadata struct containing multiple candidate transcripts. Each transcript has per-token metadata including timing information.
- Note
This method will free the state pointer (
ctx
).- Parameters
ctx
: A streaming state pointer returned by createStream().num_results
: Maximum number of candidate transcripts to return. Returned list might be smaller than this.
-
inline void
addHotWord
(String word, float boost)¶ Add a hot-word.
Words that don’t occur in the scorer (e.g. proper nouns) or strings that contain spaces won’t be taken into account.
- Parameters
word
:boost
: Positive value increases and negative reduces chance of a word occuring in a transcription. Excessive positive boost might lead to splitting up of letters of the word following the hot-word.
- Exceptions
RuntimeException
: on failure.
-
inline void
eraseHotWord
(String word)¶ Erase a hot-word.
- Parameters
word
:
- Exceptions
RuntimeException
: on failure.
-
inline void
clearHotWords
()¶ Clear all hot-words.
- Exceptions
RuntimeException
: on failure.
-
inline
Metadata¶
-
class
ai::coqui::libstt
::
Metadata
¶ An array of CandidateTranscript objects computed by the model.
Public Functions
-
inline long
getNumTranscripts
()¶ Size of the transcripts array
-
inline CandidateTranscript
getTranscript
(int i)¶ Retrieve one CandidateTranscript element
- Return
The CandidateTranscript requested or null
- Parameters
i
: Array index of the CandidateTranscript to get
-
inline long
CandidateTranscript¶
-
class
ai::coqui::libstt
::
CandidateTranscript
¶ A single transcript computed by the model, including a confidence
value and the metadata for its constituent tokens.
Public Functions
-
inline long
getNumTokens
()¶ Size of the tokens array
-
inline double
getConfidence
()¶ Approximated confidence value for this transcript. This is roughly the
sum of the acoustic model logit values for each timestep/character that
contributed to the creation of this transcript.
-
inline TokenMetadata
getToken
(int i)¶ Retrieve one TokenMetadata element
- Return
The TokenMetadata requested or null
- Parameters
i
: Array index of the TokenMetadata to get
-
inline long
TokenMetadata¶
-
class
ai::coqui::libstt
::
TokenMetadata
¶ Stores text of an individual token, along with its timing information