speechbrain.k2_integration.lattice_decoder module
Different decoding graph algorithms for k2, be it HL or HLG (with G LM and bigger rescoring LM).
This code was adjusted from icefall (https://github.com/k2-fsa/icefall/blob/master/icefall/decode.py).
- Authors:
Pierre Champion 2023
Zeyu Zhao 2023
Georgios Karakasidis 2023
Summary
Functions:
This function reads a config and creates the decoder for k2 graph compiler decoding. There are the following cases: - HLG is compiled and LM rescoring is used. In that case, compose_HL_with_G and use_G_rescoring are both True and we will create for example G_3_gram.fst.txt and G_4_gram.fst.txt. Note that the 3gram and 4gram ARPA lms will need to exist under |
|
Get the decoding lattice from a decoding graph and neural network output. |
|
Get the best path from a lattice. |
|
Intersect the lattice with an n-gram LM and use shortest path to decode. |
Reference
- speechbrain.k2_integration.lattice_decoder.get_decoding(hparams: Dict, graphCompiler: GraphCompiler, device='cpu')[source]
This function reads a config and creates the decoder for k2 graph compiler decoding. There are the following cases:
HLG is compiled and LM rescoring is used. In that case, compose_HL_with_G and use_G_rescoring are both True and we will create for example G_3_gram.fst.txt and G_4_gram.fst.txt. Note that the 3gram and 4gram ARPA lms will need to exist under
hparams['lm_dir']
.HLG is compiled but LM rescoring is not used. In that case, compose_HL_with_G is True and use_G_rescoring is False and we will create for example G_3_gram.fst.txt. Note that the 3gram ARPA lm will need to exist under
hparams['lm_dir']
.HLG is not compiled (only use HL graph) and LM rescoring used. In that case, compose_HL_with_G is False and use_G_rescoring is True. Note that the 4gram ARPA lms will need to exist under
hparams['lm_dir']
.HLG is not compiled (only use HL graph) and LM rescoring is not used. In that case, compose_HL_with_G is False and use_G_rescoring is False and we will not convert LM to FST.
- Parameters:
hparams (dict) – The hyperparameters.
graphCompiler (graph_compiler.GraphCompiler) – The graphCompiler (H)
device (torch.device) – The device to use.
- Returns:
- decoding_graph: k2.Fsa
A HL or HLG decoding graph. Used with a nnet output and the function
get_lattice
to obtain a decoding latticek2.Fsa
.- decoding_method: Callable[[k2.Fsa], k2.Fsa]
A function to call with a decoding lattice
k2.Fsa
(obtained after nnet output intersect with a HL or HLG). Retuns an FsaVec containing linear FSAs
- Return type:
Dict
Example
>>> import torch >>> from speechbrain.k2_integration.losses import ctc_k2 >>> from speechbrain.k2_integration.utils import lattice_paths_to_text >>> from speechbrain.k2_integration.graph_compiler import CtcGraphCompiler >>> from speechbrain.k2_integration.lexicon import Lexicon >>> from speechbrain.k2_integration.prepare_lang import prepare_lang >>> from speechbrain.k2_integration.lattice_decoder import get_decoding >>> from speechbrain.k2_integration.lattice_decoder import get_lattice
>>> batch_size = 1
>>> log_probs = torch.randn(batch_size, 40, 10) >>> log_probs.requires_grad = True >>> # Assume all utterances have the same length so no padding was needed. >>> input_lens = torch.ones(batch_size) >>> # Create a samll lexicon containing only two words and write it to a file. >>> lang_tmpdir = getfixture('tmpdir') >>> lexicon_sample = "hello h e l l o\nworld w o r l d\n<UNK> <unk>" >>> lexicon_file = lang_tmpdir.join("lexicon.txt") >>> lexicon_file.write(lexicon_sample) >>> # Create a lang directory with the lexicon and L.pt, L_inv.pt, L_disambig.pt >>> prepare_lang(lang_tmpdir) >>> # Create a lexicon object >>> lexicon = Lexicon(lang_tmpdir) >>> # Create a random decoding graph >>> graph = CtcGraphCompiler( ... lexicon, ... log_probs.device, ... )
>>> decode = get_decoding( ... {"compose_HL_with_G": False, ... "decoding_method": "onebest", ... "lang_dir": lang_tmpdir}, ... graph) >>> lattice = get_lattice(log_probs, input_lens, decode["decoding_graph"]) >>> path = decode["decoding_method"](lattice)['1best'] >>> text = lattice_paths_to_text(path, lexicon.word_table)
- speechbrain.k2_integration.lattice_decoder.get_lattice(log_probs_nnet_output: Tensor, input_lens: Tensor, decoder: Fsa, search_beam: int = 5, output_beam: int = 5, min_active_states: int = 300, max_active_states: int = 1000, ac_scale: float = 1.0, subsampling_factor: int = 1) Fsa [source]
Get the decoding lattice from a decoding graph and neural network output.
- Parameters:
log_probs_nnet_output – It is the output of a neural model of shape
(batch, seq_len, num_tokens)
.input_lens – It is an int tensor of shape (batch,). It contains lengths of each sequence in
log_probs_nnet_output
.decoder – It is an instance of
k2.Fsa
that represents the decoding graph.search_beam – Decoding beam, e.g. 20. Ger is faster, larger is more exact (less pruning). This is the default value; it may be modified by
min_active_states
andmax_active_states
.output_beam – Beam to prune output, similar to lattice-beam in Kaldi. Relative to best path of output.
min_active_states – Minimum number of FSA states that are allowed to be active on any given frame for any given intersection/composition task. This is advisory, in that it will try not to have fewer than this number active. Set it to zero if there is no constraint.
max_active_states – Maximum number of FSA states that are allowed to be active on any given frame for any given intersection/composition task. This is advisory, in that it will try not to exceed that but may not always succeed. You can use a very large number if no constraint is needed.
ac_scale – acoustic scale applied to
log_probs_nnet_output
subsampling_factor – The subsampling factor of the model.
- Returns:
An FsaVec containing the decoding result. It has axes [utt][state][arc].
- Return type:
lattice
- speechbrain.k2_integration.lattice_decoder.one_best_decoding(lattice: Fsa, use_double_scores: bool = True) Fsa [source]
Get the best path from a lattice.
- Parameters:
lattice – The decoding lattice returned by
get_lattice()
.use_double_scores – True to use double precision floating point in the computation. False to use single precision.
- Returns:
An FsaVec containing linear paths.
- Return type:
best_path
- speechbrain.k2_integration.lattice_decoder.rescore_with_whole_lattice(lattice: Fsa, G_with_epsilon_loops: Fsa, lm_scale_list: List[float] | None = None, use_double_scores: bool = True) Fsa | Dict[str, Fsa] [source]
Intersect the lattice with an n-gram LM and use shortest path to decode. The input lattice is obtained by intersecting
HLG
with a DenseFsaVec, where theG
inHLG
is in general a 3-gram LM. The inputG_with_epsilon_loops
is usually a 4-gram LM. You can consider this function as a second pass decoding. In the first pass decoding, we use a small G, while we use a larger G in the second pass decoding.- Parameters:
lattice (k2.Fsa) – An FsaVec with axes [utt][state][arc]. Its
aux_labels
are word IDs. It must have an attributelm_scores
.G_with_epsilon_loops (k2.Fsa) – An FsaVec containing only a single FSA. It contains epsilon self-loops. It is an acceptor and its labels are word IDs.
lm_scale_list (Optional[List[float]]) – If none, return the intersection of
lattice
andG_with_epsilon_loops
. If not None, it contains a list of values to scale LM scores. For each scale, there is a corresponding decoding result contained in the resulting dict.use_double_scores (bool) – True to use double precision in the computation. False to use single precision.
- Returns:
If
lm_scale_list
is None, return a new lattice which is the intersectionresult of
lattice
andG_with_epsilon_loops
.Otherwise, return a dict whose key is an entry in
lm_scale_list
and thevalue is the decoding result (i.e., an FsaVec containing linear FSAs).