speechbrain.decoders.ctc module

Decoders and output normalization for CTC.

Authors
  • Mirco Ravanelli 2020

  • Aku Rouhe 2020

  • Sung-Lin Yeh 2020

Summary

Classes:

CTCPrefixScorer

This class implements the CTC prefix scorer of Algorithm 2 in reference: https://www.merl.com/publications/docs/TR2017-190.pdf.

Functions:

ctc_greedy_decode

Greedy decode a batch of probabilities and apply CTC rules.

filter_ctc_output

Apply CTC output merge and filter rules.

Reference

class speechbrain.decoders.ctc.CTCPrefixScorer(x, enc_lens, batch_size, beam_size, blank_index, eos_index, ctc_window_size=0)[source]

Bases: object

This class implements the CTC prefix scorer of Algorithm 2 in reference: https://www.merl.com/publications/docs/TR2017-190.pdf. Official implementation: https://github.com/espnet/espnet/blob/master/espnet/nets/ctc_prefix_score.py

Parameters
  • x (torch.Tensor) – The encoder states.

  • enc_lens (torch.Tensor) – The actual length of each enc_states sequence.

  • batch_size (int) – The size of the batch.

  • beam_size (int) – The width of beam.

  • blank_index (int) – The index of the blank token.

  • eos_index (int) – The index of the end-of-sequence (eos) token.

  • ctc_window_size (int) – Compute the ctc scores over the time frames using windowing based on attention peaks. If 0, no windowing applied.

forward_step(g, state, candidates=None, attn=None)[source]

This method if one step of forwarding operation for the prefix ctc scorer.

Parameters
  • g (torch.Tensor) – The tensor of prefix label sequences, h = g + c.

  • state (tuple) – Previous ctc states.

  • candidates (torch.Tensor) – (batch_size * beam_size, ctc_beam_size), The topk candidates for rescoring. The ctc_beam_size is set as 2 * beam_size. If given, performing partial ctc scoring.

permute_mem(memory, index)[source]

This method permutes the CTC model memory to synchronize the memory index with the current output.

Parameters
  • memory (No limit) – The memory variable to be permuted.

  • index (torch.Tensor) – The index of the previous path.

Returns

Return type

The variable of the memory being permuted.

speechbrain.decoders.ctc.filter_ctc_output(string_pred, blank_id=- 1)[source]

Apply CTC output merge and filter rules.

Removes the blank symbol and output repetitions.

Parameters
  • string_pred (list) – A list containing the output strings/ints predicted by the CTC system.

  • blank_id (int, string) – The id of the blank.

Returns

The output predicted by CTC without the blank symbol and the repetitions.

Return type

list

Example

>>> string_pred = ['a','a','blank','b','b','blank','c']
>>> string_out = filter_ctc_output(string_pred, blank_id='blank')
>>> print(string_out)
['a', 'b', 'c']
speechbrain.decoders.ctc.ctc_greedy_decode(probabilities, seq_lens, blank_id=- 1)[source]

Greedy decode a batch of probabilities and apply CTC rules.

Parameters
  • probabilities (torch.tensor) – Output probabilities (or log-probabilities) from the network with shape [batch, probabilities, time]

  • seq_lens (torch.tensor) – Relative true sequence lengths (to deal with padded inputs), the longest sequence has length 1.0, others a value between zero and one shape [batch, lengths].

  • blank_id (int, string) – The blank symbol/index. Default: -1. If a negative number is given, it is assumed to mean counting down from the maximum possible index, so that -1 refers to the maximum possible index.

Returns

Outputs as Python list of lists, with “ragged” dimensions; padding has been removed.

Return type

list

Example

>>> import torch
>>> probs = torch.tensor([[[0.3, 0.7], [0.0, 0.0]],
...                       [[0.2, 0.8], [0.9, 0.1]]])
>>> lens = torch.tensor([0.51, 1.0])
>>> blank_id = 0
>>> ctc_greedy_decode(probs, lens, blank_id)
[[1], [1]]