speechbrain.nnet.loss.transducer_loss module

Transducer loss implementation (depends on numba)

Authors

Abdelwahab Heba 2020
Titouan Parcollet 2023

Summary

Classes:

`Transducer`	This class implements the Transducer loss computation with forward-backward algorithm Sequence Transduction with naive implementation : https://arxiv.org/pdf/1211.3711.pdf
`TransducerLoss`	This class implements the Transduce loss computation with forward-backward algorithm.

Functions:

`cu_kernel_backward`	Compute backward pass for the forward-backward algorithm using Numba cuda kernel.
`cu_kernel_compute_grad`	Compute gradient for the forward-backward algorithm using Numba cuda kernel.
`cu_kernel_forward`	Compute forward pass for the forward-backward algorithm using Numba cuda kernel.

Reference

speechbrain.nnet.loss.transducer_loss.cu_kernel_forward(log_probs, labels, alpha, log_p, T, U, blank, lock)[source]

Compute forward pass for the forward-backward algorithm using Numba cuda kernel. Sequence Transduction with naive implementation : https://arxiv.org/pdf/1211.3711.pdf

Parameters:

log_probs (tensor) – 4D Tensor of (batch x TimeLength x LabelLength x outputDim) from the Transducer network.
labels (tensor) – 2D Tensor of (batch x MaxSeqLabelLength) containing targets of the batch with zero padding.
alpha (tensor) – 3D Tensor of (batch x TimeLength x LabelLength) for forward computation.
log_p (tensor) – 1D Tensor of (batch) for forward cost computation.
T (tensor) – 1D Tensor of (batch) containing TimeLength of each target.
U (tensor) – 1D Tensor of (batch) containing LabelLength of each target.
blank (int) – Blank indice.
lock (tensor) – 2D Tensor of (batch x LabelLength) containing bool(1-0) lock for parallel computation.

speechbrain.nnet.loss.transducer_loss.cu_kernel_backward(log_probs, labels, beta, log_p, T, U, blank, lock)[source]

Compute backward pass for the forward-backward algorithm using Numba cuda kernel. Sequence Transduction with naive implementation : https://arxiv.org/pdf/1211.3711.pdf

Parameters:

log_probs (tensor) – 4D Tensor of (batch x TimeLength x LabelLength x outputDim) from the Transducer network.
labels (tensor) – 2D Tensor of (batch x MaxSeqLabelLength) containing targets of the batch with zero padding.
beta (tensor) – 3D Tensor of (batch x TimeLength x LabelLength) for backward computation.
log_p (tensor) – 1D Tensor of (batch) for backward cost computation.
T (tensor) – 1D Tensor of (batch) containing TimeLength of each target.
U (tensor) – 1D Tensor of (batch) containing LabelLength of each target.
blank (int) – Blank indice.
lock (tensor) – 2D Tensor of (batch x LabelLength) containing bool(1-0) lock for parallel computation.

speechbrain.nnet.loss.transducer_loss.cu_kernel_compute_grad(log_probs, labels, alpha, beta, grads, T, U, blank)[source]

Compute gradient for the forward-backward algorithm using Numba cuda kernel. Sequence Transduction with naive implementation : https://arxiv.org/pdf/1211.3711.pdf

Parameters:

log_probs (tensor) – 4D Tensor of (batch x TimeLength x LabelLength x outputDim) from the Transducer network.
labels (tensor) – 2D Tensor of (batch x MaxSeqLabelLength) containing targets of the batch with zero padding.
beta (tensor) – 3D Tensor of (batch x TimeLength x LabelLength) for backward computation.
log_p (tensor) – 1D Tensor of (batch) for backward cost computation.
T (tensor) – 1D Tensor of (batch) containing TimeLength of each target.
U (tensor) – 1D Tensor of (batch) containing LabelLength of each target.
blank (int) – Blank indice.
lock (int) – 2D Tensor of (batch x LabelLength) containing bool(1-0) lock for parallel computation.

class speechbrain.nnet.loss.transducer_loss.Transducer(*args, **kwargs)[source]

Bases: Function

This class implements the Transducer loss computation with forward-backward algorithm Sequence Transduction with naive implementation : https://arxiv.org/pdf/1211.3711.pdf

This class use torch.autograd.Function. In fact of using the forward-backward algorithm, we need to compute the gradient manually.

This class can’t be instantiated, please refer to TransducerLoss class

It is also possible to use this class directly by using Transducer.apply

static forward(ctx, log_probs, labels, T, U, blank, reduction)[source]: Computes the transducer loss.

static backward(ctx, grad_output)[source]: Backward computations for the transducer loss.

class speechbrain.nnet.loss.transducer_loss.TransducerLoss(blank=0, reduction='mean')[source]

Bases: Module

This class implements the Transduce loss computation with forward-backward algorithm. Sequence Transduction with naive implementation : https://arxiv.org/pdf/1211.3711.pdf

The TranducerLoss(nn.Module) use Transducer(autograd.Function) to compute the forward-backward loss and gradients.

Input tensors must be on a cuda device.

Example

>>> import torch
>>> loss = TransducerLoss(blank=0)
>>> logits = torch.randn((1,2,3,5)).cuda().requires_grad_()
>>> labels = torch.Tensor([[1,2]]).cuda().int()
>>> act_length = torch.Tensor([2]).cuda().int()
>>> # U = label_length+1
>>> label_length = torch.Tensor([2]).cuda().int()
>>> l = loss(logits, labels, act_length, label_length)
>>> l.backward()

forward(logits, labels, T, U)[source]: Computes the transducer loss.

training: bool