speechbrain.nnet.loss.guidedattn_loss moduleο
The Guided Attention Loss implementation
This loss can be used to speed up the training of models in which the correspondence between inputs and outputs is roughly linear, and the attention alignments are expected to be approximately diagonal, such as Grapheme-to-Phoneme and Text-to-Speech
Authors * Artem Ploujnikov 2021
Summaryο
Classes:
A loss implementation that forces attention matrices to be near-diagonal, imposing progressively larger penalties for paying attention to regions far away from the diagonal). |
Referenceο
- class speechbrain.nnet.loss.guidedattn_loss.GuidedAttentionLoss(sigma=0.2)[source]ο
Bases:
Module
A loss implementation that forces attention matrices to be near-diagonal, imposing progressively larger penalties for paying attention to regions far away from the diagonal). It is useful for sequence-to-sequence models in which the sequence of outputs is expected to correspond closely to the sequence of inputs, such as TTS or G2P
https://arxiv.org/abs/1710.08969
The implementation is inspired by the R9Y9 DeepVoice3 model https://github.com/r9y9/deepvoice3_pytorch
It should be roughly equivalent to it; however, it has been fully vectorized.
- Parameters:
sigma (float) β the guided attention weight
Example
NOTE: In a real scenario, the input_lengths and target_lengths would come from a data batch, whereas alignments would come from a model >>> import torch >>> from speechbrain.nnet.loss.guidedattn_loss import GuidedAttentionLoss >>> loss = GuidedAttentionLoss(sigma=0.2) >>> input_lengths = torch.tensor([2, 3]) >>> target_lengths = torch.tensor([3, 4]) >>> alignments = torch.tensor( β¦ [ β¦ [ β¦ [0.8, 0.2, 0.0], β¦ [0.4, 0.6, 0.0], β¦ [0.2, 0.8, 0.0], β¦ [0.0, 0.0, 0.0], β¦ ], β¦ [ β¦ [0.6, 0.2, 0.2], β¦ [0.1, 0.7, 0.2], β¦ [0.3, 0.4, 0.3], β¦ [0.2, 0.3, 0.5], β¦ ], β¦ ] β¦ ) >>> loss(alignments, input_lengths, target_lengths) tensor(0.1142)
- forward(attention, input_lengths, target_lengths, max_input_len=None, max_target_len=None)[source]ο
Computes the guided attention loss for a single batch
- Parameters:
attention (torch.Tensor) β A padded attention/alignments matrix (batch, targets, inputs)
input_lengths (torch.tensor) β A (batch, lengths) tensor of input lengths
target_lengths (torch.tensor) β A (batch, lengths) tensor of target lengths
max_input_len (int) β The maximum input length - optional, if not computed will be set to the maximum of target_lengths. Setting it explicitly might be necessary when using data parallelism
max_target_len (int) β The maximum target length - optional, if not computed will be set to the maximum of target_lengths. Setting it explicitly might be necessary when using data parallelism
- Returns:
loss β A single-element tensor with the loss value
- Return type:
torch.Tensor
- guided_attentions(input_lengths, target_lengths, max_input_len=None, max_target_len=None)[source]ο
Computes guided attention matrices
- Parameters:
input_lengths (torch.Tensor) β A tensor of input lengths
target_lengths (torch.Tensor) β A tensor of target lengths
max_input_len (int) β The maximum input length - optional, if not computed will be set to the maximum of target_lengths. Setting it explicitly might be necessary when using data parallelism
max_target_len (int) β The maximum target length - optional, if not computed will be set to the maximum of target_lengths. Setting it explicitly might be necessary when using data parallelism
- Returns:
soft_mask β The guided attention tensor of shape (batch, max_input_len, max_target_len)
- Return type:
torch.Tensor