speechbrain.lobes.models.g2p.homograph module

Tools for homograph disambiguation Authors

Artem Ploujnikov 2021

Summary

Classes:

`SubsequenceExtractor`	A utility class to help extract subsequences out of a batch of sequences
`SubsequenceLoss`	A loss function for a specific word in the output, used in the homograph disambiguation task The approach is as follows: 1.

Reference

class speechbrain.lobes.models.g2p.homograph.SubsequenceLoss(seq_cost, word_separator=0, word_separator_base=0)[source]

Bases: Module

A loss function for a specific word in the output, used in the homograph disambiguation task The approach is as follows: 1. Arrange only the target words from the original batch into a single tensor 2. Find the word index of each target word 3. Compute the beginnings and endings of words in the predicted sequences. The assumption is that the model has been trained well enough to identify word boundaries with a simple argmax without having to perform a beam search. Important! This loss can be used for fine-tuning only The model is expected to be able to already be able to correctly predict word boundaries

Parameters:

seq_cost (callable) – the loss to be used on the extracted subsequences
word_separator (int) – the index of the “space” character (in phonemes)
word_separator_base (str) – the index of word separators used in unprocessed targets (if different, used with tokenizations)

Example

>>> import torch
>>> from speechbrain.lobes.models.g2p.homograph import SubsequenceLoss
>>> from speechbrain.nnet.losses import nll_loss
>>> loss = SubsequenceLoss(seq_cost=nll_loss)
>>> phns = torch.Tensor(
...     [[1, 2, 0, 1, 3, 0, 2, 1, 0], [2, 1, 3, 0, 1, 2, 0, 3, 2]]
... )
>>> phn_lens = torch.IntTensor([8, 9])
>>> subsequence_phn_start = torch.IntTensor([3, 4])
>>> subsequence_phn_end = torch.IntTensor([5, 7])
>>> p_seq = torch.Tensor(
...     [
...         [
...             [0.0, 1.0, 0.0, 0.0],
...             [0.0, 0.0, 1.0, 0.0],
...             [1.0, 0.0, 0.0, 0.0],
...             [0.0, 1.0, 0.0, 0.0],
...             [0.0, 0.0, 0.0, 1.0],
...             [1.0, 0.0, 0.0, 0.0],
...             [0.0, 0.0, 1.0, 0.0],
...             [0.0, 1.0, 0.0, 0.0],
...             [1.0, 0.0, 0.0, 0.0],
...         ],
...         [
...             [0.0, 0.0, 1.0, 0.0],
...             [0.0, 1.0, 0.0, 0.0],
...             [0.0, 0.0, 0.0, 1.0],
...             [1.0, 0.0, 0.0, 0.0],
...             [0.0, 1.0, 0.0, 0.0],
...             [0.0, 0.0, 1.0, 0.0],
...             [1.0, 0.0, 0.0, 0.0],
...             [0.0, 0.0, 0.0, 1.0],
...             [0.0, 0.0, 1.0, 0.0],
...         ],
...     ]
... )
>>> loss_value = loss(
...     phns, phn_lens, p_seq, subsequence_phn_start, subsequence_phn_end
... )
>>> loss_value
tensor(-0.8000)

property word_separator: The word separator being used

property word_separator_base: The word separator being used

forward(phns, phn_lens, p_seq, subsequence_phn_start, subsequence_phn_end, phns_base=None, phn_lens_base=None)[source]

Evaluates the subsequence loss

Parameters:

phns (torch.Tensor) – the phoneme tensor (batch x length)
phn_lens (torch.Tensor) – the phoneme length tensor
p_seq (torch.Tensor) – the output phoneme probability tensor (batch x length x phns)
subsequence_phn_start (torch.Tensor) – the beginning of the target subsequence (i.e. the homograph)
subsequence_phn_end (torch.Tensor) – the end of the target subsequence (i.e. the homograph)
phns_base (torch.Tensor) – the phoneme tensor (not preprocessed)
phn_lens_base (torch.Tensor) – the phoneme lengths (not preprocessed)

Returns:

loss – the loss tensor

Return type:

torch.Tensor

class speechbrain.lobes.models.g2p.homograph.SubsequenceExtractor(word_separator=0, word_separator_base=None)[source]

Bases: object

A utility class to help extract subsequences out of a batch of sequences

Parameters:

word_separator (int) – the index of the word separator (used in p_seq)
word_separator_base (int) – the index of word separators used in unprocessed targets (if different)

Example

>>> import torch
>>> from speechbrain.lobes.models.g2p.homograph import SubsequenceExtractor
>>> extractor = SubsequenceExtractor()
>>> phns = torch.Tensor(
...     [[1, 2, 0, 1, 3, 0, 2, 1, 0], [2, 1, 3, 0, 1, 2, 0, 3, 2]]
... )
>>> phn_lens = torch.IntTensor([8, 9])
>>> subsequence_phn_start = torch.IntTensor([3, 4])
>>> subsequence_phn_end = torch.IntTensor([5, 7])
>>> p_seq = torch.Tensor(
...     [
...         [
...             [0.0, 1.0, 0.0, 0.0],
...             [0.0, 0.0, 1.0, 0.0],
...             [1.0, 0.0, 0.0, 0.0],
...             [0.0, 1.0, 0.0, 0.0],
...             [0.0, 0.0, 0.0, 1.0],
...             [1.0, 0.0, 0.0, 0.0],
...             [0.0, 0.0, 1.0, 0.0],
...             [0.0, 1.0, 0.0, 0.0],
...             [1.0, 0.0, 0.0, 0.0],
...         ],
...         [
...             [0.0, 0.0, 1.0, 0.0],
...             [0.0, 1.0, 0.0, 0.0],
...             [0.0, 0.0, 0.0, 1.0],
...             [1.0, 0.0, 0.0, 0.0],
...             [0.0, 1.0, 0.0, 0.0],
...             [0.0, 0.0, 1.0, 0.0],
...             [1.0, 0.0, 0.0, 0.0],
...             [0.0, 0.0, 0.0, 1.0],
...             [0.0, 0.0, 1.0, 0.0],
...         ],
...     ]
... )
>>> extractor.extract_seq(
...     phns, phn_lens, p_seq, subsequence_phn_start, subsequence_phn_end
... )
(tensor([[[0., 1., 0., 0.],
         [0., 0., 0., 1.],
         [0., 0., 0., 0.]],

        [[0., 1., 0., 0.],
         [0., 0., 1., 0.],
         [0., 0., 0., 0.]]]), tensor([[1., 3., 0.],
        [1., 2., 0.]]), tensor([0.6667, 1.0000]))

extract_seq(phns, phn_lens, p_seq, subsequence_phn_start, subsequence_phn_end, phns_base=None, phn_base_lens=None)[source]

Extracts the subsequence from the complete sequence

Parameters:

phns (torch.Tensor) – the phoneme tensor (batch x length)
phn_lens (torch.Tensor) – the phoneme length tensor
p_seq (torch.Tensor) – the output phoneme probability tensor (batch x length x phns)
subsequence_phn_start (torch.Tensor) – the beginning of the target subsequence (i.e. the homograph)
subsequence_phn_end (torch.Tensor) – the end of the target subsequence (i.e. the homograph)
phns_base (torch.Tensor) – the phoneme tensor (not preprocessed)
phn_base_lens (torch.Tensor) – the phoneme lengths (not preprocessed)

Returns:

p_seq_subsequence (torch.Tensor) – the output subsequence (of probabilities)
phns_subsequence (torch.Tensor) – the target subsequence
subsequence_lengths (torch.Tensor) – subsequence lengths, expressed as a fraction of the tensor’s last dimension

extract_hyps(ref_seq, hyps, subsequence_phn_start, use_base=False)[source]

Extracts a subsequence from hypotheses (e.g. the result of a beam search) based on a reference sequence, which can be either a sequence of phonemes (the target during training)

Parameters:

ref_seq (torch.Tensor) – a reference sequence (e.g. phoneme targets)
hyps (list) – a batch of hypotheses, a list of list of integer indices (usually of phonemes)
subsequence_phn_start (torch.Tensor) – the index of the beginning of the subsequence to
use_base (bool) – whether to use the raw (token) space for word separators

Returns:

result – The extracted subsequence.

Return type:

torch.Tensor