speechbrain.k2_integration.utils module

Utilities for k2 integration with SpeechBrain.

This code was adjusted from icefall (https://github.com/k2-fsa/icefall).

Authors:

Pierre Champion 2023
Zeyu Zhao 2023
Georgios Karakasidis 2023

Summary

Functions:

`lattice_path_to_textid`	Extract the texts (as word IDs) from the best-path FSAs.
`lattice_paths_to_text`	Convert the best path to a list of strings.
`load_G`	load a lm to be used in the decoding graph creation (or lm rescoring).
`prepare_rescoring_G`	Prepare a LM with the purpose of using it for LM rescoring.

Reference

speechbrain.k2_integration.utils.lattice_path_to_textid(best_paths: Fsa, return_ragged: bool = False) → List[List[int]] | RaggedTensor[source]

Extract the texts (as word IDs) from the best-path FSAs.

Parameters:

best_paths (k2.Fsa) – A k2.Fsa with best_paths.arcs.num_axes() == 3, i.e. containing multiple FSAs, which is expected to be the result of k2.shortest_path (otherwise the returned values won’t be meaningful).
return_ragged (bool) – True to return a ragged tensor with two axes [utt][word_id]. False to return a list-of-list word IDs.

Returns:

Returns a list of lists of int, containing the label sequences we
decoded.

speechbrain.k2_integration.utils.lattice_paths_to_text(best_paths: Fsa, word_table) → List[str][source]

Convert the best path to a list of strings.

Parameters:

best_paths (k2.Fsa) – It is the path in the lattice with the highest score for a given utterance.
word_table (List[str] or Dict[int,str]) – It is a list or dict that maps word IDs to words.

Returns:

texts – A list of strings, each of which is the decoding result of the corresponding utterance.

Return type:

List[str]

speechbrain.k2_integration.utils.load_G(path: str | Path, cache: bool = True) → Fsa[source]

load a lm to be used in the decoding graph creation (or lm rescoring).

Parameters:

path (str) – The path to an FST LM (ending with .fst.txt) or a k2-converted LM (in pytorch .pt format).
cache (bool) – Whether or not to load/cache the LM from/to the .pt format (in the same dir).

Returns:

G – An FSA representing the LM.

Return type:

k2.Fsa

speechbrain.k2_integration.utils.prepare_rescoring_G(G: Fsa) → Fsa[source]

Prepare a LM with the purpose of using it for LM rescoring. For instance, in the librispeech recipe this is a 4-gram LM (while a 3gram LM is used for HLG construction).

Parameters:: G (k2.Fsa) – An FSA representing the LM.
Returns:: G – An FSA representing the LM, with the following modifications: - G.aux_labels is removed - G.lm_scores is set to G.scores - G is arc-sorted
Return type:: k2.Fsa