speechbrain.decoders.ctc module¶
Decoders and output normalization for CTC.
- Authors
Mirco Ravanelli 2020
Aku Rouhe 2020
Sung-Lin Yeh 2020
Summary¶
Classes:
This class implements the CTC prefix scorer of Algorithm 2 in reference: https://www.merl.com/publications/docs/TR2017-190.pdf. |
Functions:
Greedy decode a batch of probabilities and apply CTC rules. |
|
Apply CTC output merge and filter rules. |
Reference¶
-
class
speechbrain.decoders.ctc.
CTCPrefixScorer
(x, enc_lens, batch_size, beam_size, blank_index, eos_index, ctc_window_size=0)[source]¶ Bases:
object
This class implements the CTC prefix scorer of Algorithm 2 in reference: https://www.merl.com/publications/docs/TR2017-190.pdf. Official implementation: https://github.com/espnet/espnet/blob/master/espnet/nets/ctc_prefix_score.py
- Parameters
x (torch.Tensor) – The encoder states.
enc_lens (torch.Tensor) – The actual length of each enc_states sequence.
batch_size (int) – The size of the batch.
beam_size (int) – The width of beam.
blank_index (int) – The index of the blank token.
eos_index (int) – The index of the end-of-sequence (eos) token.
ctc_window_size (int) – Compute the ctc scores over the time frames using windowing based on attention peaks. If 0, no windowing applied.
-
forward_step
(g, state, candidates=None, attn=None)[source]¶ This method if one step of forwarding operation for the prefix ctc scorer.
- Parameters
g (torch.Tensor) – The tensor of prefix label sequences, h = g + c.
state (tuple) – Previous ctc states.
candidates (torch.Tensor) – (batch_size * beam_size, ctc_beam_size), The topk candidates for rescoring. The ctc_beam_size is set as 2 * beam_size. If given, performing partial ctc scoring.
-
permute_mem
(memory, index)[source]¶ This method permutes the CTC model memory to synchronize the memory index with the current output.
- Parameters
memory (No limit) – The memory variable to be permuted.
index (torch.Tensor) – The index of the previous path.
- Returns
- Return type
The variable of the memory being permuted.
-
speechbrain.decoders.ctc.
filter_ctc_output
(string_pred, blank_id=- 1)[source]¶ Apply CTC output merge and filter rules.
Removes the blank symbol and output repetitions.
- Parameters
- Returns
The output predicted by CTC without the blank symbol and the repetitions.
- Return type
Example
>>> string_pred = ['a','a','blank','b','b','blank','c'] >>> string_out = filter_ctc_output(string_pred, blank_id='blank') >>> print(string_out) ['a', 'b', 'c']
-
speechbrain.decoders.ctc.
ctc_greedy_decode
(probabilities, seq_lens, blank_id=- 1)[source]¶ Greedy decode a batch of probabilities and apply CTC rules.
- Parameters
probabilities (torch.tensor) – Output probabilities (or log-probabilities) from the network with shape [batch, probabilities, time]
seq_lens (torch.tensor) – Relative true sequence lengths (to deal with padded inputs), the longest sequence has length 1.0, others a value between zero and one shape [batch, lengths].
blank_id (int, string) – The blank symbol/index. Default: -1. If a negative number is given, it is assumed to mean counting down from the maximum possible index, so that -1 refers to the maximum possible index.
- Returns
Outputs as Python list of lists, with “ragged” dimensions; padding has been removed.
- Return type
Example
>>> import torch >>> probs = torch.tensor([[[0.3, 0.7], [0.0, 0.0]], ... [[0.2, 0.8], [0.9, 0.1]]]) >>> lens = torch.tensor([0.51, 1.0]) >>> blank_id = 0 >>> ctc_greedy_decode(probs, lens, blank_id) [[1], [1]]