speechbrain.nnet.loss.stoi_loss module

Summary

Functions:

removeSilentFrames

stoi_loss

Compute the STOI score and return -1 * that score.

thirdoct

Returns the 1/3 octave band matrix.

Reference

speechbrain.nnet.loss.stoi_loss.thirdoct(fs, nfft, num_bands, min_freq)[source]

Returns the 1/3 octave band matrix.

Parameters
  • fs (int) – Sampling rate.

  • nfft (int) – FFT size.

  • num_bands (int) – Number of 1/3 octave bands.

  • min_freq (int) – Center frequency of the lowest 1/3 octave band.

Returns

obm – Octave Band Matrix.

Return type

tensor

speechbrain.nnet.loss.stoi_loss.removeSilentFrames(x, y, dyn_range=40, N=256, K=128)[source]
speechbrain.nnet.loss.stoi_loss.stoi_loss(y_pred_batch, y_true_batch, lens, reduction='mean')[source]

Compute the STOI score and return -1 * that score.

This function can be used as a loss function for training with SGD-based updates.

Parameters
  • y_pred_batch (torch.Tensor) – The degraded (enhanced) waveforms.

  • y_true_batch (torch.Tensor) – The clean (reference) waveforms.

  • lens (torch.Tensor) – The relative lengths of the waveforms within the batch.

  • reduction (str) – The type of reduction (“mean” or “batch”) to use.

Example

>>> a = torch.sin(torch.arange(16000, dtype=torch.float32)).unsqueeze(0)
>>> b = a + 0.001
>>> -stoi_loss(b, a, torch.ones(1))
tensor(0.7...)