speechbrain.nnet.loss.transducer_loss module

Transducer loss implementation (depends on numba)

Authors
  • Abdelwahab Heba 2020

  • Titouan Parcollet 2023

Summary

Classes:

Transducer

This class implements the Transducer loss computation with forward-backward algorithm Sequence Transduction with naive implementation : https://arxiv.org/pdf/1211.3711.pdf

TransducerLoss

This class implements the Transduce loss computation with forward-backward algorithm.

Functions:

cu_kernel_backward

Compute backward pass for the forward-backward algorithm using Numba cuda kernel.

cu_kernel_compute_grad

Compute gradient for the forward-backward algorithm using Numba cuda kernel.

cu_kernel_forward

Compute forward pass for the forward-backward algorithm using Numba cuda kernel.

Reference

speechbrain.nnet.loss.transducer_loss.cu_kernel_forward(log_probs, labels, alpha, log_p, T, U, blank, lock)[source]

Compute forward pass for the forward-backward algorithm using Numba cuda kernel. Sequence Transduction with naive implementation : https://arxiv.org/pdf/1211.3711.pdf

Parameters:
  • log_probs (tensor) – 4D Tensor of (batch x TimeLength x LabelLength x outputDim) from the Transducer network.

  • labels (tensor) – 2D Tensor of (batch x MaxSeqLabelLength) containing targets of the batch with zero padding.

  • alpha (tensor) – 3D Tensor of (batch x TimeLength x LabelLength) for forward computation.

  • log_p (tensor) – 1D Tensor of (batch) for forward cost computation.

  • T (tensor) – 1D Tensor of (batch) containing TimeLength of each target.

  • U (tensor) – 1D Tensor of (batch) containing LabelLength of each target.

  • blank (int) – Blank indice.

  • lock (tensor) – 2D Tensor of (batch x LabelLength) containing bool(1-0) lock for parallel computation.

speechbrain.nnet.loss.transducer_loss.cu_kernel_backward(log_probs, labels, beta, log_p, T, U, blank, lock)[source]

Compute backward pass for the forward-backward algorithm using Numba cuda kernel. Sequence Transduction with naive implementation : https://arxiv.org/pdf/1211.3711.pdf

Parameters:
  • log_probs (tensor) – 4D Tensor of (batch x TimeLength x LabelLength x outputDim) from the Transducer network.

  • labels (tensor) – 2D Tensor of (batch x MaxSeqLabelLength) containing targets of the batch with zero padding.

  • beta (tensor) – 3D Tensor of (batch x TimeLength x LabelLength) for backward computation.

  • log_p (tensor) – 1D Tensor of (batch) for backward cost computation.

  • T (tensor) – 1D Tensor of (batch) containing TimeLength of each target.

  • U (tensor) – 1D Tensor of (batch) containing LabelLength of each target.

  • blank (int) – Blank indice.

  • lock (tensor) – 2D Tensor of (batch x LabelLength) containing bool(1-0) lock for parallel computation.

speechbrain.nnet.loss.transducer_loss.cu_kernel_compute_grad(log_probs, labels, alpha, beta, grads, T, U, blank)[source]

Compute gradient for the forward-backward algorithm using Numba cuda kernel. Sequence Transduction with naive implementation : https://arxiv.org/pdf/1211.3711.pdf

Parameters:
  • log_probs (tensor) – 4D Tensor of (batch x TimeLength x LabelLength x outputDim) from the Transducer network.

  • labels (tensor) – 2D Tensor of (batch x MaxSeqLabelLength) containing targets of the batch with zero padding.

  • beta (tensor) – 3D Tensor of (batch x TimeLength x LabelLength) for backward computation.

  • log_p (tensor) – 1D Tensor of (batch) for backward cost computation.

  • T (tensor) – 1D Tensor of (batch) containing TimeLength of each target.

  • U (tensor) – 1D Tensor of (batch) containing LabelLength of each target.

  • blank (int) – Blank indice.

  • lock (int) – 2D Tensor of (batch x LabelLength) containing bool(1-0) lock for parallel computation.

class speechbrain.nnet.loss.transducer_loss.Transducer(*args, **kwargs)[source]

Bases: Function

This class implements the Transducer loss computation with forward-backward algorithm Sequence Transduction with naive implementation : https://arxiv.org/pdf/1211.3711.pdf

This class use torch.autograd.Function. In fact of using the forward-backward algorithm, we need to compute the gradient manually.

This class can’t be instantiated, please refer to TransducerLoss class

It is also possible to use this class directly by using Transducer.apply

static forward(ctx, log_probs, labels, T, U, blank, reduction)[source]

Computes the transducer loss.

static backward(ctx, grad_output)[source]

Backward computations for the transducer loss.

class speechbrain.nnet.loss.transducer_loss.TransducerLoss(blank=0, reduction='mean')[source]

Bases: Module

This class implements the Transduce loss computation with forward-backward algorithm. Sequence Transduction with naive implementation : https://arxiv.org/pdf/1211.3711.pdf

The TranducerLoss(nn.Module) use Transducer(autograd.Function) to compute the forward-backward loss and gradients.

Input tensors must be on a cuda device.

Example

>>> import torch
>>> loss = TransducerLoss(blank=0)
>>> logits = torch.randn((1,2,3,5)).cuda().requires_grad_()
>>> labels = torch.Tensor([[1,2]]).cuda().int()
>>> act_length = torch.Tensor([2]).cuda().int()
>>> # U = label_length+1
>>> label_length = torch.Tensor([2]).cuda().int()
>>> l = loss(logits, labels, act_length, label_length)
>>> l.backward()
forward(logits, labels, T, U)[source]

Computes the transducer loss.

training: bool