speechbrain.integrations.k2_fsa.lattice_decoder moduleο
Different decoding graph algorithms for k2, be it HL or HLG (with G LM and bigger rescoring LM).
This code was adjusted from icefall (https://github.com/k2-fsa/icefall/blob/master/icefall/decode.py).
- Authors:
Pierre Champion 2023
Zeyu Zhao 2023
Georgios Karakasidis 2023
Summaryο
Functions:
This function reads a config and creates the decoder for k2 graph compiler decoding. There are the following cases: - HLG is compiled and LM rescoring is used. In that case, compose_HL_with_G and use_G_rescoring are both True and we will create for example G_3_gram.fst.txt and G_4_gram.fst.txt. Note that the 3gram and 4gram ARPA lms will need to exist under |
|
Get the decoding lattice from a decoding graph and neural network output. |
|
Get the best path from a lattice. |
|
Intersect the lattice with an n-gram LM and use shortest path to decode. |
Referenceο
- speechbrain.integrations.k2_fsa.lattice_decoder.get_decoding(hparams: Dict, graphCompiler: GraphCompiler, device='cpu')[source]ο
This function reads a config and creates the decoder for k2 graph compiler decoding. There are the following cases:
HLG is compiled and LM rescoring is used. In that case, compose_HL_with_G and use_G_rescoring are both True and we will create for example G_3_gram.fst.txt and G_4_gram.fst.txt. Note that the 3gram and 4gram ARPA lms will need to exist under
hparams['lm_dir'].HLG is compiled but LM rescoring is not used. In that case, compose_HL_with_G is True and use_G_rescoring is False and we will create for example G_3_gram.fst.txt. Note that the 3gram ARPA lm will need to exist under
hparams['lm_dir'].HLG is not compiled (only use HL graph) and LM rescoring used. In that case, compose_HL_with_G is False and use_G_rescoring is True. Note that the 4gram ARPA lms will need to exist under
hparams['lm_dir'].HLG is not compiled (only use HL graph) and LM rescoring is not used. In that case, compose_HL_with_G is False and use_G_rescoring is False and we will not convert LM to FST.
- Parameters:
hparams (dict) β The hyperparameters.
graphCompiler (graph_compiler.GraphCompiler) β The graphCompiler (H)
device (torch.device) β The device to use.
- Returns:
- decoding_graph: k2.Fsa
A HL or HLG decoding graph. Used with a nnet output and the function
get_latticeto obtain a decoding latticek2.Fsa.- decoding_method: Callable[[k2.Fsa], k2.Fsa]
A function to call with a decoding lattice
k2.Fsa(obtained after nnet output intersect with a HL or HLG). Returns an FsaVec containing linear FSAs
- Return type:
Dict
Example
>>> import torch >>> from speechbrain.integrations.k2_fsa.losses import ctc_k2 >>> from speechbrain.integrations.k2_fsa.utils import lattice_paths_to_text >>> from speechbrain.integrations.k2_fsa.graph_compiler import ( ... CtcGraphCompiler, ... ) >>> from speechbrain.integrations.k2_fsa.lexicon import Lexicon >>> from speechbrain.integrations.k2_fsa.prepare_lang import prepare_lang >>> from speechbrain.integrations.k2_fsa.lattice_decoder import get_decoding >>> from speechbrain.integrations.k2_fsa.lattice_decoder import get_lattice
>>> batch_size = 1
>>> log_probs = torch.randn(batch_size, 40, 10) >>> log_probs.requires_grad = True >>> # Assume all utterances have the same length so no padding was needed. >>> input_lens = torch.ones(batch_size) >>> # Create a small lexicon containing only two words and write it to a file. >>> lang_tmpdir = getfixture("tmpdir") >>> lexicon_sample = "hello h e l l o\nworld w o r l d\n<UNK> <unk>" >>> lexicon_file = lang_tmpdir.join("lexicon.txt") >>> lexicon_file.write(lexicon_sample) >>> # Create a lang directory with the lexicon and L.pt, L_inv.pt, L_disambig.pt >>> prepare_lang(lang_tmpdir) >>> # Create a lexicon object >>> lexicon = Lexicon(lang_tmpdir) >>> # Create a random decoding graph >>> graph = CtcGraphCompiler( ... lexicon, ... log_probs.device, ... )
>>> decode = get_decoding( ... { ... "compose_HL_with_G": False, ... "decoding_method": "onebest", ... "lang_dir": lang_tmpdir, ... }, ... graph, ... ) >>> lattice = get_lattice(log_probs, input_lens, decode["decoding_graph"]) >>> path = decode["decoding_method"](lattice)["1best"] >>> text = lattice_paths_to_text(path, lexicon.word_table)
- speechbrain.integrations.k2_fsa.lattice_decoder.get_lattice(log_probs_nnet_output: Tensor, input_lens: Tensor, decoder: k2.Fsa, search_beam: int = 5, output_beam: int = 5, min_active_states: int = 300, max_active_states: int = 1000, ac_scale: float = 1.0, subsampling_factor: int = 1) k2.Fsa[source]ο
Get the decoding lattice from a decoding graph and neural network output.
- Parameters:
log_probs_nnet_output (torch.Tensor) β It is the output of a neural model of shape
(batch, seq_len, num_tokens).input_lens (torch.Tensor) β It is an int tensor of shape (batch,). It contains lengths of each sequence in
log_probs_nnet_output.decoder (k2.Fsa) β It is an instance of
k2.Fsathat represents the decoding graph.search_beam (int) β Decoding beam, e.g. 20. Ger is faster, larger is more exact (less pruning). This is the default value; it may be modified by
min_active_statesandmax_active_states.output_beam (int) β Beam to prune output, similar to lattice-beam in Kaldi. Relative to best path of output.
min_active_states (int) β Minimum number of FSA states that are allowed to be active on any given frame for any given intersection/composition task. This is advisory, in that it will try not to have fewer than this number active. Set it to zero if there is no constraint.
max_active_states (int) β Maximum number of FSA states that are allowed to be active on any given frame for any given intersection/composition task. This is advisory, in that it will try not to exceed that but may not always succeed. You can use a very large number if no constraint is needed.
ac_scale (float) β acoustic scale applied to
log_probs_nnet_outputsubsampling_factor (int) β The subsampling factor of the model.
- Returns:
lattice β An FsaVec containing the decoding result. It has axes [utt][state][arc].
- Return type:
k2.Fsa
- speechbrain.integrations.k2_fsa.lattice_decoder.one_best_decoding(lattice: k2.Fsa, use_double_scores: bool = True) k2.Fsa[source]ο
Get the best path from a lattice.
- Parameters:
lattice (k2.Fsa) β The decoding lattice returned by
get_lattice().use_double_scores (bool) β True to use double precision floating point in the computation. False to use single precision.
- Returns:
best_path β An FsaVec containing linear paths.
- Return type:
k2.Fsa
- speechbrain.integrations.k2_fsa.lattice_decoder.rescore_with_whole_lattice(lattice: k2.Fsa, G_with_epsilon_loops: k2.Fsa, lm_scale_list: List[float] | None = None, use_double_scores: bool = True) k2.Fsa | Dict[str, k2.Fsa][source]ο
Intersect the lattice with an n-gram LM and use shortest path to decode. The input lattice is obtained by intersecting
HLGwith a DenseFsaVec, where theGinHLGis in general a 3-gram LM. The inputG_with_epsilon_loopsis usually a 4-gram LM. You can consider this function as a second pass decoding. In the first pass decoding, we use a small G, while we use a larger G in the second pass decoding.- Parameters:
lattice (k2.Fsa) β An FsaVec with axes [utt][state][arc]. Its
aux_labelsare word IDs. It must have an attributelm_scores.G_with_epsilon_loops (k2.Fsa) β An FsaVec containing only a single FSA. It contains epsilon self-loops. It is an acceptor and its labels are word IDs.
lm_scale_list (Optional[List[float]]) β If none, return the intersection of
latticeandG_with_epsilon_loops. If not None, it contains a list of values to scale LM scores. For each scale, there is a corresponding decoding result contained in the resulting dict.use_double_scores (bool) β True to use double precision in the computation. False to use single precision.
- Returns:
If
lm_scale_listis None, return a new lattice which is the intersectionresult of
latticeandG_with_epsilon_loops.Otherwise, return a dict whose key is an entry in
lm_scale_listand thevalue is the decoding result (i.e., an FsaVec containing linear FSAs).