speechbrain.integrations.k2_fsa.lattice_decoder module

Different decoding graph algorithms for k2, be it HL or HLG (with G LM and bigger rescoring LM).

This code was adjusted from icefall (https://github.com/k2-fsa/icefall/blob/master/icefall/decode.py).

Authors:

Pierre Champion 2023
Zeyu Zhao 2023
Georgios Karakasidis 2023

Summary

Functions:

`get_decoding`	This function reads a config and creates the decoder for k2 graph compiler decoding. There are the following cases: - HLG is compiled and LM rescoring is used. In that case, compose_HL_with_G and use_G_rescoring are both True and we will create for example G_3_gram.fst.txt and G_4_gram.fst.txt. Note that the 3gram and 4gram ARPA lms will need to exist under `hparams['lm_dir']`. - HLG is compiled but LM rescoring is not used. In that case, compose_HL_with_G is True and use_G_rescoring is False and we will create for example G_3_gram.fst.txt. Note that the 3gram ARPA lm will need to exist under `hparams['lm_dir']`. - HLG is not compiled (only use HL graph) and LM rescoring used. In that case, compose_HL_with_G is False and use_G_rescoring is True. Note that the 4gram ARPA lms will need to exist under `hparams['lm_dir']`. - HLG is not compiled (only use HL graph) and LM rescoring is not used. In that case, compose_HL_with_G is False and use_G_rescoring is False and we will not convert LM to FST.
`get_lattice`	Get the decoding lattice from a decoding graph and neural network output.
`one_best_decoding`	Get the best path from a lattice.
`rescore_with_whole_lattice`	Intersect the lattice with an n-gram LM and use shortest path to decode.

Reference

speechbrain.integrations.k2_fsa.lattice_decoder.get_decoding(hparams: Dict, graphCompiler: GraphCompiler, device='cpu')[source]

This function reads a config and creates the decoder for k2 graph compiler decoding. There are the following cases:

HLG is compiled and LM rescoring is used. In that case, compose_HL_with_G and use_G_rescoring are both True and we will create for example G_3_gram.fst.txt and G_4_gram.fst.txt. Note that the 3gram and 4gram ARPA lms will need to exist under hparams['lm_dir'].

HLG is compiled but LM rescoring is not used. In that case, compose_HL_with_G is True and use_G_rescoring is False and we will create for example G_3_gram.fst.txt. Note that the 3gram ARPA lm will need to exist under hparams['lm_dir'].

HLG is not compiled (only use HL graph) and LM rescoring used. In that case, compose_HL_with_G is False and use_G_rescoring is True. Note that the 4gram ARPA lms will need to exist under hparams['lm_dir'].

HLG is not compiled (only use HL graph) and LM rescoring is not used. In that case, compose_HL_with_G is False and use_G_rescoring is False and we will not convert LM to FST.

Parameters:

hparams (dict) – The hyperparameters.
graphCompiler (graph_compiler.GraphCompiler) – The graphCompiler (H)
device (torch.device) – The device to use.

Returns:

decoding_graph: k2.Fsa: A HL or HLG decoding graph. Used with a nnet output and the function get_lattice to obtain a decoding lattice k2.Fsa.
decoding_method: Callable[[k2.Fsa], k2.Fsa]: A function to call with a decoding lattice k2.Fsa (obtained after nnet output intersect with a HL or HLG). Returns an FsaVec containing linear FSAs

Return type:

Dict

Example

>>> import torch
>>> from speechbrain.integrations.k2_fsa.losses import ctc_k2
>>> from speechbrain.integrations.k2_fsa.utils import lattice_paths_to_text
>>> from speechbrain.integrations.k2_fsa.graph_compiler import (
...     CtcGraphCompiler,
... )
>>> from speechbrain.integrations.k2_fsa.lexicon import Lexicon
>>> from speechbrain.integrations.k2_fsa.prepare_lang import prepare_lang
>>> from speechbrain.integrations.k2_fsa.lattice_decoder import get_decoding
>>> from speechbrain.integrations.k2_fsa.lattice_decoder import get_lattice

>>> batch_size = 1

>>> log_probs = torch.randn(batch_size, 40, 10)
>>> log_probs.requires_grad = True
>>> # Assume all utterances have the same length so no padding was needed.
>>> input_lens = torch.ones(batch_size)
>>> # Create a small lexicon containing only two words and write it to a file.
>>> lang_tmpdir = getfixture("tmpdir")
>>> lexicon_sample = "hello h e l l o\nworld w o r l d\n<UNK> <unk>"
>>> lexicon_file = lang_tmpdir.join("lexicon.txt")
>>> lexicon_file.write(lexicon_sample)
>>> # Create a lang directory with the lexicon and L.pt, L_inv.pt, L_disambig.pt
>>> prepare_lang(lang_tmpdir)
>>> # Create a lexicon object
>>> lexicon = Lexicon(lang_tmpdir)
>>> # Create a random decoding graph
>>> graph = CtcGraphCompiler(
...     lexicon,
...     log_probs.device,
... )

>>> decode = get_decoding(
...     {
...         "compose_HL_with_G": False,
...         "decoding_method": "onebest",
...         "lang_dir": lang_tmpdir,
...     },
...     graph,
... )
>>> lattice = get_lattice(log_probs, input_lens, decode["decoding_graph"])
>>> path = decode["decoding_method"](lattice)["1best"]
>>> text = lattice_paths_to_text(path, lexicon.word_table)

speechbrain.integrations.k2_fsa.lattice_decoder.get_lattice(log_probs_nnet_output: Tensor, input_lens: Tensor, decoder: k2.Fsa, search_beam: int = 5, output_beam: int = 5, min_active_states: int = 300, max_active_states: int = 1000, ac_scale: float = 1.0, subsampling_factor: int = 1) → k2.Fsa[source]

Get the decoding lattice from a decoding graph and neural network output.

Parameters:

log_probs_nnet_output (torch.Tensor) – It is the output of a neural model of shape (batch, seq_len, num_tokens).
input_lens (torch.Tensor) – It is an int tensor of shape (batch,). It contains lengths of each sequence in log_probs_nnet_output.
decoder (k2.Fsa) – It is an instance of k2.Fsa that represents the decoding graph.
search_beam (int) – Decoding beam, e.g. 20. Ger is faster, larger is more exact (less pruning). This is the default value; it may be modified by min_active_states and max_active_states.
output_beam (int) – Beam to prune output, similar to lattice-beam in Kaldi. Relative to best path of output.
min_active_states (int) – Minimum number of FSA states that are allowed to be active on any given frame for any given intersection/composition task. This is advisory, in that it will try not to have fewer than this number active. Set it to zero if there is no constraint.
max_active_states (int) – Maximum number of FSA states that are allowed to be active on any given frame for any given intersection/composition task. This is advisory, in that it will try not to exceed that but may not always succeed. You can use a very large number if no constraint is needed.
ac_scale (float) – acoustic scale applied to log_probs_nnet_output
subsampling_factor (int) – The subsampling factor of the model.

Returns:

lattice – An FsaVec containing the decoding result. It has axes [utt][state][arc].

Return type:

k2.Fsa

speechbrain.integrations.k2_fsa.lattice_decoder.one_best_decoding(lattice: k2.Fsa, use_double_scores: bool = True) → k2.Fsa[source]

Get the best path from a lattice.

Parameters:

lattice (k2.Fsa) – The decoding lattice returned by get_lattice().
use_double_scores (bool) – True to use double precision floating point in the computation. False to use single precision.

Returns:

best_path – An FsaVec containing linear paths.

Return type:

k2.Fsa

speechbrain.integrations.k2_fsa.lattice_decoder.rescore_with_whole_lattice(lattice: k2.Fsa, G_with_epsilon_loops: k2.Fsa, lm_scale_list: List[float] | None = None, use_double_scores: bool = True) → k2.Fsa | Dict[str, k2.Fsa][source]

Intersect the lattice with an n-gram LM and use shortest path to decode. The input lattice is obtained by intersecting HLG with a DenseFsaVec, where the G in HLG is in general a 3-gram LM. The input G_with_epsilon_loops is usually a 4-gram LM. You can consider this function as a second pass decoding. In the first pass decoding, we use a small G, while we use a larger G in the second pass decoding.

Parameters:

lattice (k2.Fsa) – An FsaVec with axes [utt][state][arc]. Its aux_labels are word IDs. It must have an attribute lm_scores.
G_with_epsilon_loops (k2.Fsa) – An FsaVec containing only a single FSA. It contains epsilon self-loops. It is an acceptor and its labels are word IDs.
lm_scale_list (Optional[List[float]]) – If none, return the intersection of lattice and G_with_epsilon_loops. If not None, it contains a list of values to scale LM scores. For each scale, there is a corresponding decoding result contained in the resulting dict.
use_double_scores (bool) – True to use double precision in the computation. False to use single precision.

Returns:

If lm_scale_list is None, return a new lattice which is the intersection
result of lattice and G_with_epsilon_loops.
Otherwise, return a dict whose key is an entry in lm_scale_list and the
value is the decoding result (i.e., an FsaVec containing linear FSAs).