speechbrain.lobes.models.g2p.homograph module
Tools for homograph disambiguation Authors
Artem Ploujnikov 2021
Summary
Classes:
A utility class to help extract subsequences out of a batch of sequences |
|
A loss function for a specific word in the output, used in the homograph disambiguation task The approach is as follows: 1. Arrange only the target words from the original batch into a single tensor 2. Find the word index of each target word 3. Compute the beginnings and endings of words in the predicted sequences. The assumption is that the model has been trained well enough to identify word boundaries with a simple argmax without having to perform a beam search. Important! This loss can be used for fine-tuning only The model is expected to be able to already be able to correctly predict word boundaries :param seq_cost: the loss to be used on the extracted subsequences :type seq_cost: callable :param word_separator: the index of the "space" character (in phonemes) :type word_separator: int :param word_separator_base: the index of word separators used in unprocessed targets (if different, used with tokenizations) :type word_separator_base: str. |
Reference
- class speechbrain.lobes.models.g2p.homograph.SubsequenceLoss(seq_cost, word_separator=0, word_separator_base=0)[source]
Bases:
Module
A loss function for a specific word in the output, used in the homograph disambiguation task The approach is as follows: 1. Arrange only the target words from the original batch into a single tensor 2. Find the word index of each target word 3. Compute the beginnings and endings of words in the predicted sequences. The assumption is that the model has been trained well enough to identify word boundaries with a simple argmax without having to perform a beam search. Important! This loss can be used for fine-tuning only The model is expected to be able to already be able to correctly predict word boundaries :param seq_cost: the loss to be used on the extracted subsequences :type seq_cost: callable :param word_separator: the index of the “space” character (in phonemes) :type word_separator: int :param word_separator_base: the index of word separators used in unprocessed
targets (if different, used with tokenizations)
Example
>>> import torch >>> from speechbrain.lobes.models.g2p.homograph import SubsequenceLoss >>> from speechbrain.nnet.losses import nll_loss >>> loss = SubsequenceLoss( ... seq_cost=nll_loss ... ) >>> phns = torch.Tensor( ... [[1, 2, 0, 1, 3, 0, 2, 1, 0], ... [2, 1, 3, 0, 1, 2, 0, 3, 2]] ... ) >>> phn_lens = torch.IntTensor([8, 9]) >>> subsequence_phn_start = torch.IntTensor([3, 4]) >>> subsequence_phn_end = torch.IntTensor([5, 7]) >>> p_seq = torch.Tensor([ ... [[0., 1., 0., 0.], ... [0., 0., 1., 0.], ... [1., 0., 0., 0.], ... [0., 1., 0., 0.], ... [0., 0., 0., 1.], ... [1., 0., 0., 0.], ... [0., 0., 1., 0.], ... [0., 1., 0., 0.], ... [1., 0., 0., 0.]], ... [[0., 0., 1., 0.], ... [0., 1., 0., 0.], ... [0., 0., 0., 1.], ... [1., 0., 0., 0.], ... [0., 1., 0., 0.], ... [0., 0., 1., 0.], ... [1., 0., 0., 0.], ... [0., 0., 0., 1.], ... [0., 0., 1., 0.]] ... ]) >>> loss_value = loss( ... phns, ... phn_lens, ... p_seq, ... subsequence_phn_start, ... subsequence_phn_end ... ) >>> loss_value tensor(-0.8000)
- property word_separator
The word separator being used
- property word_separator_base
The word separator being used
- forward(phns, phn_lens, p_seq, subsequence_phn_start, subsequence_phn_end, phns_base=None, phn_lens_base=None)[source]
Evaluates the subsequence loss
- Parameters:
phns (torch.Tensor) – the phoneme tensor (batch x length)
phn_lens (torch.Tensor) – the phoneme length tensor
p_seq (torch.Tensor) – the output phoneme probability tensor (batch x length x phns)
subsequence_phn_start (torch.Tensor) – the beginning of the target subsequence (i.e. the homograph)
subsequence_phn_end (torch.Tensor) – the end of the target subsequence (i.e. the homograph)
phns_base (torch.Tensor) – the phoneme tensor (not preprocessed)
phn_lens_base (torch.Tensor) – the phoneme lengths (not preprocessed)
- Returns:
loss – the loss tensor
- Return type:
- class speechbrain.lobes.models.g2p.homograph.SubsequenceExtractor(word_separator=0, word_separator_base=None)[source]
Bases:
object
A utility class to help extract subsequences out of a batch of sequences
- Parameters:
word_separator (int) – the index of the word separator (used in p_seq)
int (word_separator_base) – the index of word separators used in unprocessed targets (if different)
Example
>>> import torch >>> from speechbrain.lobes.models.g2p.homograph import SubsequenceExtractor >>> extractor = SubsequenceExtractor() >>> phns = torch.Tensor( ... [[1, 2, 0, 1, 3, 0, 2, 1, 0], ... [2, 1, 3, 0, 1, 2, 0, 3, 2]] ... ) >>> phn_lens = torch.IntTensor([8, 9]) >>> subsequence_phn_start = torch.IntTensor([3, 4]) >>> subsequence_phn_end = torch.IntTensor([5, 7]) >>> p_seq = torch.Tensor([ ... [[0., 1., 0., 0.], ... [0., 0., 1., 0.], ... [1., 0., 0., 0.], ... [0., 1., 0., 0.], ... [0., 0., 0., 1.], ... [1., 0., 0., 0.], ... [0., 0., 1., 0.], ... [0., 1., 0., 0.], ... [1., 0., 0., 0.]], ... [[0., 0., 1., 0.], ... [0., 1., 0., 0.], ... [0., 0., 0., 1.], ... [1., 0., 0., 0.], ... [0., 1., 0., 0.], ... [0., 0., 1., 0.], ... [1., 0., 0., 0.], ... [0., 0., 0., 1.], ... [0., 0., 1., 0.]] ... ]) >>> extractor.extract_seq( ... phns, ... phn_lens, ... p_seq, ... subsequence_phn_start, ... subsequence_phn_end ... ) (tensor([[[0., 1., 0., 0.], [0., 0., 0., 1.], [0., 0., 0., 0.]], [[0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 0.]]]), tensor([[1., 3., 0.], [1., 2., 0.]]), tensor([0.6667, 1.0000]))
- extract_seq(phns, phn_lens, p_seq, subsequence_phn_start, subsequence_phn_end, phns_base=None, phn_base_lens=None)[source]
Extracts the subsequence from the complete sequence
- phns: torch.Tensor
the phoneme tensor (batch x length)
- phn_lens: torch.Tensor
the phoneme length tensor
- p_seq: torch.Tensor
the output phoneme probability tensor (batch x length x phns)
- subsequence_phn_start: torch.Tensor
the beginning of the target subsequence (i.e. the homograph)
- subsequence_phn_end: torch.Tensor
the end of the target subsequence (i.e. the homograph)
- phns_base: torch.Tensor
the phoneme tensor (not preprocessed)
- phn_base_lens: torch.Tensor
the phoneme lengths (not preprocessed)
- Returns:
p_seq_subsequence (torch.Tensor) – the output subsequence (of probabilities)
phns_subsequence (torch.Tensor) – the target subsequence
subsequence_lengths (torch.Tensor) – subsequence lengths, expressed as a fraction of the tensor’s last dimension
- extract_hyps(ref_seq, hyps, subsequence_phn_start, use_base=False)[source]
Extracts a subsequnce from hypotheses (e.g. the result of a beam search) based on a refernece sequence, which can be either a sequence of phonemes (the target during training) :param ref_seq: a reference sequence (e.g. phoneme targets) :type ref_seq: torch.Tensor :param hyps: a batch of hypotheses, a list of list of
integer indices (usually of phonemes)
- Parameters:
subsequence_phn_start (torch.tensor) – the index of the beginning of the subsequence to
use_base (bool) – whether to use the raw (token) space for word separators