speechbrain.nnet.loss.stoi_loss module
Library for computing STOI computation. Reference: “End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks”, TASLP, 2018
- Authors:
Szu-Wei, Fu 2020
Summary
Functions:
Removes silent frames from the STOI computation. |
|
Compute the STOI score and return -1 * that score. |
|
Returns the 1/3 octave band matrix. |
Reference
- speechbrain.nnet.loss.stoi_loss.thirdoct(fs, nfft, num_bands, min_freq)[source]
Returns the 1/3 octave band matrix.
- speechbrain.nnet.loss.stoi_loss.removeSilentFrames(x, y, dyn_range=40, N=256, K=128)[source]
Removes silent frames from the STOI computation.
This function can be used as a loss function for training with SGD-based updates.
- Parameters
x (torch.Tensor) – The clean (reference) waveforms.
y (torch.Tensor) – The degraded (enhanced) waveforms.
dyn_range (int) – Dynamic range used for mask computation.
N (int) – Window length.
K (int) – Step size.
- speechbrain.nnet.loss.stoi_loss.stoi_loss(y_pred_batch, y_true_batch, lens, reduction='mean')[source]
Compute the STOI score and return -1 * that score.
This function can be used as a loss function for training with SGD-based updates.
- Parameters
y_pred_batch (torch.Tensor) – The degraded (enhanced) waveforms.
y_true_batch (torch.Tensor) – The clean (reference) waveforms.
lens (torch.Tensor) – The relative lengths of the waveforms within the batch.
reduction (str) – The type of reduction (“mean” or “batch”) to use.
Example
>>> a = torch.sin(torch.arange(16000, dtype=torch.float32)).unsqueeze(0) >>> b = a + 0.001 >>> -stoi_loss(b, a, torch.ones(1)) tensor(0.7...)