speechbrain.nnet.loss.stoi_loss module

Library for computing STOI computation. Reference: “End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks”, TASLP, 2018

Authors:: Szu-Wei, Fu 2020

Summary

Functions:

`removeSilentFrames`	Removes silent frames from the STOI computation.
`stoi_loss`	Compute the STOI score and return -1 * that score.
`thirdoct`	Returns the 1/3 octave band matrix.

Reference

speechbrain.nnet.loss.stoi_loss.thirdoct(fs, nfft, num_bands, min_freq)[source]

Returns the 1/3 octave band matrix.

Parameters

fs (int) – Sampling rate.
nfft (int) – FFT size.
num_bands (int) – Number of 1/3 octave bands.
min_freq (int) – Center frequency of the lowest 1/3 octave band.

Returns

obm – Octave Band Matrix.

Return type

tensor

speechbrain.nnet.loss.stoi_loss.removeSilentFrames(x, y, dyn_range=40, N=256, K=128)[source]

Removes silent frames from the STOI computation.

This function can be used as a loss function for training with SGD-based updates.

Parameters

x (torch.Tensor) – The clean (reference) waveforms.
y (torch.Tensor) – The degraded (enhanced) waveforms.
dyn_range (int) – Dynamic range used for mask computation.
N (int) – Window length.
K (int) – Step size.

speechbrain.nnet.loss.stoi_loss.stoi_loss(y_pred_batch, y_true_batch, lens, reduction='mean')[source]

Compute the STOI score and return -1 * that score.

This function can be used as a loss function for training with SGD-based updates.

Parameters

y_pred_batch (torch.Tensor) – The degraded (enhanced) waveforms.
y_true_batch (torch.Tensor) – The clean (reference) waveforms.
lens (torch.Tensor) – The relative lengths of the waveforms within the batch.
reduction (str) – The type of reduction (“mean” or “batch”) to use.

Example

>>> a = torch.sin(torch.arange(16000, dtype=torch.float32)).unsqueeze(0)
>>> b = a + 0.001
>>> -stoi_loss(b, a, torch.ones(1))
tensor(0.7...)