speechbrain.lobes.models.g2p.homograph module

Tools for homograph disambiguation Authors

  • Artem Ploujnikov 2021

Summary

Classes:

SubsequenceExtractor

A utility class to help extract subsequences out of a batch of sequences

SubsequenceLoss

A loss function for a specific word in the output, used in the homograph disambiguation task The approach is as follows: 1. Arrange only the target words from the original batch into a single tensor 2. Find the word index of each target word 3. Compute the beginnings and endings of words in the predicted sequences. The assumption is that the model has been trained well enough to identify word boundaries with a simple argmax without having to perform a beam search. Important! This loss can be used for fine-tuning only The model is expected to be able to already be able to correctly predict word boundaries :param seq_cost: the loss to be used on the extracted subsequences :type seq_cost: callable :param word_separator: the index of the "space" character (in phonemes) :type word_separator: int :param word_separator_base: the index of word separators used in unprocessed targets (if different, used with tokenizations) :type word_separator_base: str.

Reference

class speechbrain.lobes.models.g2p.homograph.SubsequenceLoss(seq_cost, word_separator=0, word_separator_base=0)[source]

Bases: Module

A loss function for a specific word in the output, used in the homograph disambiguation task The approach is as follows: 1. Arrange only the target words from the original batch into a single tensor 2. Find the word index of each target word 3. Compute the beginnings and endings of words in the predicted sequences. The assumption is that the model has been trained well enough to identify word boundaries with a simple argmax without having to perform a beam search. Important! This loss can be used for fine-tuning only The model is expected to be able to already be able to correctly predict word boundaries :param seq_cost: the loss to be used on the extracted subsequences :type seq_cost: callable :param word_separator: the index of the “space” character (in phonemes) :type word_separator: int :param word_separator_base: the index of word separators used in unprocessed

targets (if different, used with tokenizations)

Example

>>> import torch
>>> from speechbrain.lobes.models.g2p.homograph import SubsequenceLoss
>>> from speechbrain.nnet.losses import nll_loss
>>> loss = SubsequenceLoss(
...     seq_cost=nll_loss
... )
>>> phns = torch.Tensor(
...     [[1, 2, 0, 1, 3, 0, 2, 1, 0],
...      [2, 1, 3, 0, 1, 2, 0, 3, 2]]
... )
>>> phn_lens = torch.IntTensor([8, 9])
>>> subsequence_phn_start = torch.IntTensor([3, 4])
>>> subsequence_phn_end = torch.IntTensor([5, 7])
>>> p_seq = torch.Tensor([
...     [[0., 1., 0., 0.],
...      [0., 0., 1., 0.],
...      [1., 0., 0., 0.],
...      [0., 1., 0., 0.],
...      [0., 0., 0., 1.],
...      [1., 0., 0., 0.],
...      [0., 0., 1., 0.],
...      [0., 1., 0., 0.],
...      [1., 0., 0., 0.]],
...     [[0., 0., 1., 0.],
...      [0., 1., 0., 0.],
...      [0., 0., 0., 1.],
...      [1., 0., 0., 0.],
...      [0., 1., 0., 0.],
...      [0., 0., 1., 0.],
...      [1., 0., 0., 0.],
...      [0., 0., 0., 1.],
...      [0., 0., 1., 0.]]
... ])
>>> loss_value = loss(
...    phns,
...    phn_lens,
...    p_seq,
...    subsequence_phn_start,
...    subsequence_phn_end
... )
>>> loss_value
tensor(-0.8000)
property word_separator

The word separator being used

property word_separator_base

The word separator being used

forward(phns, phn_lens, p_seq, subsequence_phn_start, subsequence_phn_end, phns_base=None, phn_lens_base=None)[source]

Evaluates the subsequence loss

Parameters:
  • phns (torch.Tensor) – the phoneme tensor (batch x length)

  • phn_lens (torch.Tensor) – the phoneme length tensor

  • p_seq (torch.Tensor) – the output phoneme probability tensor (batch x length x phns)

  • subsequence_phn_start (torch.Tensor) – the beginning of the target subsequence (i.e. the homograph)

  • subsequence_phn_end (torch.Tensor) – the end of the target subsequence (i.e. the homograph)

  • phns_base (torch.Tensor) – the phoneme tensor (not preprocessed)

  • phn_lens_base (torch.Tensor) – the phoneme lengths (not preprocessed)

Returns:

loss – the loss tensor

Return type:

torch.Tensor

training: bool
class speechbrain.lobes.models.g2p.homograph.SubsequenceExtractor(word_separator=0, word_separator_base=None)[source]

Bases: object

A utility class to help extract subsequences out of a batch of sequences

Parameters:
  • word_separator (int) – the index of the word separator (used in p_seq)

  • int (word_separator_base) – the index of word separators used in unprocessed targets (if different)

Example

>>> import torch
>>> from speechbrain.lobes.models.g2p.homograph import SubsequenceExtractor
>>> extractor = SubsequenceExtractor()
>>> phns = torch.Tensor(
...     [[1, 2, 0, 1, 3, 0, 2, 1, 0],
...      [2, 1, 3, 0, 1, 2, 0, 3, 2]]
... )
>>> phn_lens = torch.IntTensor([8, 9])
>>> subsequence_phn_start = torch.IntTensor([3, 4])
>>> subsequence_phn_end = torch.IntTensor([5, 7])
>>> p_seq = torch.Tensor([
...     [[0., 1., 0., 0.],
...      [0., 0., 1., 0.],
...      [1., 0., 0., 0.],
...      [0., 1., 0., 0.],
...      [0., 0., 0., 1.],
...      [1., 0., 0., 0.],
...      [0., 0., 1., 0.],
...      [0., 1., 0., 0.],
...      [1., 0., 0., 0.]],
...     [[0., 0., 1., 0.],
...      [0., 1., 0., 0.],
...      [0., 0., 0., 1.],
...      [1., 0., 0., 0.],
...      [0., 1., 0., 0.],
...      [0., 0., 1., 0.],
...      [1., 0., 0., 0.],
...      [0., 0., 0., 1.],
...      [0., 0., 1., 0.]]
... ])
>>> extractor.extract_seq(
...    phns,
...    phn_lens,
...    p_seq,
...    subsequence_phn_start,
...    subsequence_phn_end
... )
(tensor([[[0., 1., 0., 0.],
         [0., 0., 0., 1.],
         [0., 0., 0., 0.]],

        [[0., 1., 0., 0.],
         [0., 0., 1., 0.],
         [0., 0., 0., 0.]]]), tensor([[1., 3., 0.],
        [1., 2., 0.]]), tensor([0.6667, 1.0000]))
extract_seq(phns, phn_lens, p_seq, subsequence_phn_start, subsequence_phn_end, phns_base=None, phn_base_lens=None)[source]

Extracts the subsequence from the complete sequence

phns: torch.Tensor

the phoneme tensor (batch x length)

phn_lens: torch.Tensor

the phoneme length tensor

p_seq: torch.Tensor

the output phoneme probability tensor (batch x length x phns)

subsequence_phn_start: torch.Tensor

the beginning of the target subsequence (i.e. the homograph)

subsequence_phn_end: torch.Tensor

the end of the target subsequence (i.e. the homograph)

phns_base: torch.Tensor

the phoneme tensor (not preprocessed)

phn_base_lens: torch.Tensor

the phoneme lengths (not preprocessed)

Returns:

  • p_seq_subsequence (torch.Tensor) – the output subsequence (of probabilities)

  • phns_subsequence (torch.Tensor) – the target subsequence

  • subsequence_lengths (torch.Tensor) – subsequence lengths, expressed as a fraction of the tensor’s last dimension

extract_hyps(ref_seq, hyps, subsequence_phn_start, use_base=False)[source]

Extracts a subsequnce from hypotheses (e.g. the result of a beam search) based on a refernece sequence, which can be either a sequence of phonemes (the target during training) :param ref_seq: a reference sequence (e.g. phoneme targets) :type ref_seq: torch.Tensor :param hyps: a batch of hypotheses, a list of list of

integer indices (usually of phonemes)

Parameters:
  • subsequence_phn_start (torch.tensor) – the index of the beginning of the subsequence to

  • use_base (bool) – whether to use the raw (token) space for word separators