speechbrain.k2_integration.utils module

Utilities for k2 integration with SpeechBrain.

This code was adjusted from icefall (https://github.com/k2-fsa/icefall).

Authors:
  • Pierre Champion 2023

  • Zeyu Zhao 2023

  • Georgios Karakasidis 2023

Summary

Functions:

lattice_path_to_textid

Extract the texts (as word IDs) from the best-path FSAs.

lattice_paths_to_text

Convert the best path to a list of strings.

load_G

load a lm to be used in the decoding graph creation (or lm rescoring).

prepare_rescoring_G

Prepare a LM with the purpose of using it for LM rescoring.

Reference

speechbrain.k2_integration.utils.lattice_path_to_textid(best_paths: Fsa, return_ragged: bool = False) List[List[int]] | RaggedTensor[source]

Extract the texts (as word IDs) from the best-path FSAs.

Parameters:
  • best_paths (k2.Fsa) – A k2.Fsa with best_paths.arcs.num_axes() == 3, i.e. containing multiple FSAs, which is expected to be the result of k2.shortest_path (otherwise the returned values won’t be meaningful).

  • return_ragged (bool) – True to return a ragged tensor with two axes [utt][word_id]. False to return a list-of-list word IDs.

Returns:

  • Returns a list of lists of int, containing the label sequences we

  • decoded.

speechbrain.k2_integration.utils.lattice_paths_to_text(best_paths: Fsa, word_table) List[str][source]

Convert the best path to a list of strings.

Parameters:
  • best_paths (k2.Fsa) – It is the path in the lattice with the highest score for a given utterance.

  • word_table (List[str] or Dict[int,str]) – It is a list or dict that maps word IDs to words.

Returns:

texts – A list of strings, each of which is the decoding result of the corresponding utterance.

Return type:

List[str]

speechbrain.k2_integration.utils.load_G(path: str | Path, cache: bool = True) Fsa[source]

load a lm to be used in the decoding graph creation (or lm rescoring).

Parameters:
  • path (str) – The path to an FST LM (ending with .fst.txt) or a k2-converted LM (in pytorch .pt format).

  • cache (bool) – Whether or not to load/cache the LM from/to the .pt format (in the same dir).

Returns:

G – An FSA representing the LM.

Return type:

k2.Fsa

speechbrain.k2_integration.utils.prepare_rescoring_G(G: Fsa) Fsa[source]

Prepare a LM with the purpose of using it for LM rescoring. For instance, in the librispeech recipe this is a 4-gram LM (while a 3gram LM is used for HLG construction).

Parameters:

G (k2.Fsa) – An FSA representing the LM.

Returns:

G – An FSA representing the LM, with the following modifications: - G.aux_labels is removed - G.lm_scores is set to G.scores - G is arc-sorted

Return type:

k2.Fsa