speechbrain.processing.signal_processing module

Low level signal processing utilities

Authors
  • Peter Plantinga 2020

  • Francois Grondin 2020

  • William Aris 2020

  • Samuele Cornell 2020

  • Sarthak Yadav 2022

Summary

Functions:

compute_amplitude

Compute amplitude of a batch of waveforms.

convolve1d

Use torch.nn.functional to perform 1d padding and conv.

dB_to_amplitude

Returns the amplitude ratio, converted from decibels.

gabor_impulse_response

Function for generating gabor impulse responses as used by GaborConv1d proposed in

gabor_impulse_response_legacy_complex

Function for generating gabor impulse responses, but without using complex64 dtype as used by GaborConv1d proposed in

normalize

This function normalizes a signal to unitary average or peak amplitude.

notch_filter

Returns a notch filter constructed from a high-pass and low-pass filter.

overlap_and_add

Taken from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/utils.py

rescale

This functions performs signal rescaling to a target level.

resynthesize

Function for resynthesizing waveforms from enhanced mags.

reverberate

General function to contaminate a given signal with reverberation given a Room Impulse Response (RIR).

Reference

speechbrain.processing.signal_processing.compute_amplitude(waveforms, lengths=None, amp_type='avg', scale='linear')[source]

Compute amplitude of a batch of waveforms.

Parameters
  • waveform (tensor) – The waveforms used for computing amplitude. Shape should be [time] or [batch, time] or [batch, time, channels].

  • lengths (tensor) – The lengths of the waveforms excluding the padding. Shape should be a single dimension, [batch].

  • amp_type (str) – Whether to compute “avg” average or “peak” amplitude. Choose between [“avg”, “peak”].

  • scale (str) – Whether to compute amplitude in “dB” or “linear” scale. Choose between [“linear”, “dB”].

Return type

The average amplitude of the waveforms.

Example

>>> signal = torch.sin(torch.arange(16000.0)).unsqueeze(0)
>>> compute_amplitude(signal, signal.size(1))
tensor([[0.6366]])
speechbrain.processing.signal_processing.normalize(waveforms, lengths=None, amp_type='avg', eps=1e-14)[source]

This function normalizes a signal to unitary average or peak amplitude.

Parameters
  • waveforms (tensor) – The waveforms to normalize. Shape should be [batch, time] or [batch, time, channels].

  • lengths (tensor) – The lengths of the waveforms excluding the padding. Shape should be a single dimension, [batch].

  • amp_type (str) – Whether one wants to normalize with respect to “avg” or “peak” amplitude. Choose between [“avg”, “peak”]. Note: for “avg” clipping is not prevented and can occur.

  • eps (float) – A small number to add to the denominator to prevent NaN.

Returns

waveforms – Normalized level waveform.

Return type

tensor

speechbrain.processing.signal_processing.rescale(waveforms, lengths, target_lvl, amp_type='avg', scale='linear')[source]

This functions performs signal rescaling to a target level.

Parameters
  • waveforms (tensor) – The waveforms to normalize. Shape should be [batch, time] or [batch, time, channels].

  • lengths (tensor) – The lengths of the waveforms excluding the padding. Shape should be a single dimension, [batch].

  • target_lvl (float) – Target lvl in dB or linear scale.

  • amp_type (str) – Whether one wants to rescale with respect to “avg” or “peak” amplitude. Choose between [“avg”, “peak”].

  • scale (str) – whether target_lvl belongs to linear or dB scale. Choose between [“linear”, “dB”].

Returns

waveforms – Rescaled waveforms.

Return type

tensor

speechbrain.processing.signal_processing.convolve1d(waveform, kernel, padding=0, pad_type='constant', stride=1, groups=1, use_fft=False, rotation_index=0)[source]

Use torch.nn.functional to perform 1d padding and conv.

Parameters
  • waveform (tensor) – The tensor to perform operations on.

  • kernel (tensor) – The filter to apply during convolution.

  • padding (int or tuple) – The padding (pad_left, pad_right) to apply. If an integer is passed instead, this is passed to the conv1d function and pad_type is ignored.

  • pad_type (str) – The type of padding to use. Passed directly to torch.nn.functional.pad, see PyTorch documentation for available options.

  • stride (int) – The number of units to move each time convolution is applied. Passed to conv1d. Has no effect if use_fft is True.

  • groups (int) – This option is passed to conv1d to split the input into groups for convolution. Input channels should be divisible by the number of groups.

  • use_fft (bool) – When use_fft is passed True, then compute the convolution in the spectral domain using complex multiply. This is more efficient on CPU when the size of the kernel is large (e.g. reverberation). WARNING: Without padding, circular convolution occurs. This makes little difference in the case of reverberation, but may make more difference with different kernels.

  • rotation_index (int) – This option only applies if use_fft is true. If so, the kernel is rolled by this amount before convolution to shift the output location.

Return type

The convolved waveform.

Example

>>> from speechbrain.dataio.dataio import read_audio
>>> signal = read_audio('tests/samples/single-mic/example1.wav')
>>> signal = signal.unsqueeze(0).unsqueeze(2)
>>> kernel = torch.rand(1, 10, 1)
>>> signal = convolve1d(signal, kernel, padding=(9, 0))
speechbrain.processing.signal_processing.reverberate(waveforms, rir_waveform, rescale_amp='avg')[source]

General function to contaminate a given signal with reverberation given a Room Impulse Response (RIR). It performs convolution between RIR and signal, but without changing the original amplitude of the signal.

Parameters
  • waveforms (tensor) – The waveforms to normalize. Shape should be [batch, time] or [batch, time, channels].

  • rir_waveform (tensor) – RIR tensor, shape should be [time, channels].

  • rescale_amp (str) – Whether reverberated signal is rescaled (None) and with respect either to original signal “peak” amplitude or “avg” average amplitude. Choose between [None, “avg”, “peak”].

Returns

waveforms – Reverberated signal.

Return type

tensor

speechbrain.processing.signal_processing.dB_to_amplitude(SNR)[source]

Returns the amplitude ratio, converted from decibels.

Parameters

SNR (float) – The ratio in decibels to convert.

Example

>>> round(dB_to_amplitude(SNR=10), 3)
3.162
>>> dB_to_amplitude(SNR=0)
1.0
speechbrain.processing.signal_processing.notch_filter(notch_freq, filter_width=101, notch_width=0.05)[source]

Returns a notch filter constructed from a high-pass and low-pass filter.

(from https://tomroelandts.com/articles/ how-to-create-simple-band-pass-and-band-reject-filters)

Parameters
  • notch_freq (float) – frequency to put notch as a fraction of the sampling rate / 2. The range of possible inputs is 0 to 1.

  • filter_width (int) – Filter width in samples. Longer filters have smaller transition bands, but are more inefficient.

  • notch_width (float) – Width of the notch, as a fraction of the sampling_rate / 2.

Example

>>> from speechbrain.dataio.dataio import read_audio
>>> signal = read_audio('tests/samples/single-mic/example1.wav')
>>> signal = signal.unsqueeze(0).unsqueeze(2)
>>> kernel = notch_filter(0.25)
>>> notched_signal = convolve1d(signal, kernel)
speechbrain.processing.signal_processing.overlap_and_add(signal, frame_step)[source]

Taken from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/utils.py

Reconstructs a signal from a framed representation. Adds potentially overlapping frames of a signal with shape […, frames, frame_length], offsetting subsequent frames by frame_step. The resulting tensor has shape […, output_size] where

output_size = (frames - 1) * frame_step + frame_length

Args:

signal: A […, frames, frame_length] Tensor. All dimensions may be unknown, and rank must be at least 2. frame_step: An integer denoting overlap offsets. Must be less than or equal to frame_length.

Returns:

A Tensor with shape […, output_size] containing the overlap-added frames of signal’s inner-most two dimensions. output_size = (frames - 1) * frame_step + frame_length

Based on https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/contrib/signal/python/ops/reconstruction_ops.py

Example

>>> signal = torch.randn(5, 20)
>>> overlapped = overlap_and_add(signal, 20)
>>> overlapped.shape
torch.Size([100])
speechbrain.processing.signal_processing.resynthesize(enhanced_mag, noisy_inputs, stft, istft, normalize_wavs=True)[source]

Function for resynthesizing waveforms from enhanced mags.

Parameters
  • enhanced_mag (torch.Tensor) – Predicted spectral magnitude, should be three dimensional.

  • noisy_inputs (torch.Tensor) – The noisy waveforms before any processing, to extract phase.

  • lengths (torch.Tensor) – The length of each waveform for normalization.

  • stft (torch.nn.Module) – Module for computing the STFT for extracting phase.

  • istft (torch.nn.Module) – Module for computing the iSTFT for resynthesis.

  • normalize_wavs (bool) – Whether to normalize the output wavs before returning them.

Returns

enhanced_wav – The resynthesized waveforms of the enhanced magnitudes with noisy phase.

Return type

torch.Tensor

speechbrain.processing.signal_processing.gabor_impulse_response(t, center, fwhm)[source]

Function for generating gabor impulse responses as used by GaborConv1d proposed in

Neil Zeghidour, Olivier Teboul, F{‘e}lix de Chaumont Quitry & Marco Tagliasacchi, “LEAF: A LEARNABLE FRONTEND FOR AUDIO CLASSIFICATION”, in Proc of ICLR 2021 (https://arxiv.org/abs/2101.08596)

speechbrain.processing.signal_processing.gabor_impulse_response_legacy_complex(t, center, fwhm)[source]

Function for generating gabor impulse responses, but without using complex64 dtype as used by GaborConv1d proposed in

Neil Zeghidour, Olivier Teboul, F{‘e}lix de Chaumont Quitry & Marco Tagliasacchi, “LEAF: A LEARNABLE FRONTEND FOR AUDIO CLASSIFICATION”, in Proc of ICLR 2021 (https://arxiv.org/abs/2101.08596)