speechbrain.nnet.loss.stoi_loss module

Library for computing STOI computation. Reference: “End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks”, TASLP, 2018

Authors:

Szu-Wei, Fu 2020

Summary

Functions:

removeSilentFrames

Removes silent frames from the STOI computation.

stoi_loss

Compute the STOI score and return -1 * that score.

thirdoct

Returns the 1/3 octave band matrix.

Reference

speechbrain.nnet.loss.stoi_loss.thirdoct(fs, nfft, num_bands, min_freq)[source]

Returns the 1/3 octave band matrix.

Parameters
  • fs (int) – Sampling rate.

  • nfft (int) – FFT size.

  • num_bands (int) – Number of 1/3 octave bands.

  • min_freq (int) – Center frequency of the lowest 1/3 octave band.

Returns

obm – Octave Band Matrix.

Return type

tensor

speechbrain.nnet.loss.stoi_loss.removeSilentFrames(x, y, dyn_range=40, N=256, K=128)[source]

Removes silent frames from the STOI computation.

This function can be used as a loss function for training with SGD-based updates.

Parameters
  • x (torch.Tensor) – The clean (reference) waveforms.

  • y (torch.Tensor) – The degraded (enhanced) waveforms.

  • dyn_range (int) – Dynamic range used for mask computation.

  • N (int) – Window length.

  • K (int) – Step size.

speechbrain.nnet.loss.stoi_loss.stoi_loss(y_pred_batch, y_true_batch, lens, reduction='mean')[source]

Compute the STOI score and return -1 * that score.

This function can be used as a loss function for training with SGD-based updates.

Parameters
  • y_pred_batch (torch.Tensor) – The degraded (enhanced) waveforms.

  • y_true_batch (torch.Tensor) – The clean (reference) waveforms.

  • lens (torch.Tensor) – The relative lengths of the waveforms within the batch.

  • reduction (str) – The type of reduction (“mean” or “batch”) to use.

Example

>>> a = torch.sin(torch.arange(16000, dtype=torch.float32)).unsqueeze(0)
>>> b = a + 0.001
>>> -stoi_loss(b, a, torch.ones(1))
tensor(0.7...)