speechbrain.lobes.models.g2p.homograph moduleο
Tools for homograph disambiguation Authors
Artem Ploujnikov 2021
Summaryο
Classes:
A utility class to help extract subsequences out of a batch of sequences |
|
A loss function for a specific word in the output, used in the homograph disambiguation task The approach is as follows: 1. |
Referenceο
- class speechbrain.lobes.models.g2p.homograph.SubsequenceLoss(seq_cost, word_separator=0, word_separator_base=0)[source]ο
Bases:
Module
A loss function for a specific word in the output, used in the homograph disambiguation task The approach is as follows: 1. Arrange only the target words from the original batch into a single tensor 2. Find the word index of each target word 3. Compute the beginnings and endings of words in the predicted sequences. The assumption is that the model has been trained well enough to identify word boundaries with a simple argmax without having to perform a beam search. Important! This loss can be used for fine-tuning only The model is expected to be able to already be able to correctly predict word boundaries
- Parameters:
Example
>>> import torch >>> from speechbrain.lobes.models.g2p.homograph import SubsequenceLoss >>> from speechbrain.nnet.losses import nll_loss >>> loss = SubsequenceLoss( ... seq_cost=nll_loss ... ) >>> phns = torch.Tensor( ... [[1, 2, 0, 1, 3, 0, 2, 1, 0], ... [2, 1, 3, 0, 1, 2, 0, 3, 2]] ... ) >>> phn_lens = torch.IntTensor([8, 9]) >>> subsequence_phn_start = torch.IntTensor([3, 4]) >>> subsequence_phn_end = torch.IntTensor([5, 7]) >>> p_seq = torch.Tensor([ ... [[0., 1., 0., 0.], ... [0., 0., 1., 0.], ... [1., 0., 0., 0.], ... [0., 1., 0., 0.], ... [0., 0., 0., 1.], ... [1., 0., 0., 0.], ... [0., 0., 1., 0.], ... [0., 1., 0., 0.], ... [1., 0., 0., 0.]], ... [[0., 0., 1., 0.], ... [0., 1., 0., 0.], ... [0., 0., 0., 1.], ... [1., 0., 0., 0.], ... [0., 1., 0., 0.], ... [0., 0., 1., 0.], ... [1., 0., 0., 0.], ... [0., 0., 0., 1.], ... [0., 0., 1., 0.]] ... ]) >>> loss_value = loss( ... phns, ... phn_lens, ... p_seq, ... subsequence_phn_start, ... subsequence_phn_end ... ) >>> loss_value tensor(-0.8000)
- property word_separatorο
The word separator being used
- property word_separator_baseο
The word separator being used
- forward(phns, phn_lens, p_seq, subsequence_phn_start, subsequence_phn_end, phns_base=None, phn_lens_base=None)[source]ο
Evaluates the subsequence loss
- Parameters:
phns (torch.Tensor) β the phoneme tensor (batch x length)
phn_lens (torch.Tensor) β the phoneme length tensor
p_seq (torch.Tensor) β the output phoneme probability tensor (batch x length x phns)
subsequence_phn_start (torch.Tensor) β the beginning of the target subsequence (i.e. the homograph)
subsequence_phn_end (torch.Tensor) β the end of the target subsequence (i.e. the homograph)
phns_base (torch.Tensor) β the phoneme tensor (not preprocessed)
phn_lens_base (torch.Tensor) β the phoneme lengths (not preprocessed)
- Returns:
loss β the loss tensor
- Return type:
torch.Tensor
- class speechbrain.lobes.models.g2p.homograph.SubsequenceExtractor(word_separator=0, word_separator_base=None)[source]ο
Bases:
object
A utility class to help extract subsequences out of a batch of sequences
- Parameters:
Example
>>> import torch >>> from speechbrain.lobes.models.g2p.homograph import SubsequenceExtractor >>> extractor = SubsequenceExtractor() >>> phns = torch.Tensor( ... [[1, 2, 0, 1, 3, 0, 2, 1, 0], ... [2, 1, 3, 0, 1, 2, 0, 3, 2]] ... ) >>> phn_lens = torch.IntTensor([8, 9]) >>> subsequence_phn_start = torch.IntTensor([3, 4]) >>> subsequence_phn_end = torch.IntTensor([5, 7]) >>> p_seq = torch.Tensor([ ... [[0., 1., 0., 0.], ... [0., 0., 1., 0.], ... [1., 0., 0., 0.], ... [0., 1., 0., 0.], ... [0., 0., 0., 1.], ... [1., 0., 0., 0.], ... [0., 0., 1., 0.], ... [0., 1., 0., 0.], ... [1., 0., 0., 0.]], ... [[0., 0., 1., 0.], ... [0., 1., 0., 0.], ... [0., 0., 0., 1.], ... [1., 0., 0., 0.], ... [0., 1., 0., 0.], ... [0., 0., 1., 0.], ... [1., 0., 0., 0.], ... [0., 0., 0., 1.], ... [0., 0., 1., 0.]] ... ]) >>> extractor.extract_seq( ... phns, ... phn_lens, ... p_seq, ... subsequence_phn_start, ... subsequence_phn_end ... ) (tensor([[[0., 1., 0., 0.], [0., 0., 0., 1.], [0., 0., 0., 0.]], [[0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 0.]]]), tensor([[1., 3., 0.], [1., 2., 0.]]), tensor([0.6667, 1.0000]))
- extract_seq(phns, phn_lens, p_seq, subsequence_phn_start, subsequence_phn_end, phns_base=None, phn_base_lens=None)[source]ο
Extracts the subsequence from the complete sequence
- Parameters:
phns (torch.Tensor) β the phoneme tensor (batch x length)
phn_lens (torch.Tensor) β the phoneme length tensor
p_seq (torch.Tensor) β the output phoneme probability tensor (batch x length x phns)
subsequence_phn_start (torch.Tensor) β the beginning of the target subsequence (i.e. the homograph)
subsequence_phn_end (torch.Tensor) β the end of the target subsequence (i.e. the homograph)
phns_base (torch.Tensor) β the phoneme tensor (not preprocessed)
phn_base_lens (torch.Tensor) β the phoneme lengths (not preprocessed)
- Returns:
p_seq_subsequence (torch.Tensor) β the output subsequence (of probabilities)
phns_subsequence (torch.Tensor) β the target subsequence
subsequence_lengths (torch.Tensor) β subsequence lengths, expressed as a fraction of the tensorβs last dimension
- extract_hyps(ref_seq, hyps, subsequence_phn_start, use_base=False)[source]ο
Extracts a subsequence from hypotheses (e.g. the result of a beam search) based on a reference sequence, which can be either a sequence of phonemes (the target during training)
- Parameters:
ref_seq (torch.Tensor) β a reference sequence (e.g. phoneme targets)
hyps (list) β a batch of hypotheses, a list of list of integer indices (usually of phonemes)
subsequence_phn_start (torch.Tensor) β the index of the beginning of the subsequence to
use_base (bool) β whether to use the raw (token) space for word separators
- Returns:
result β The extracted subsequence.
- Return type:
torch.Tensor