speechbrain.processing.signal_processing module

Low level signal processing utilities

Authors

Peter Plantinga 2020
Francois Grondin 2020
William Aris 2020
Samuele Cornell 2020
Sarthak Yadav 2022

Summary

Functions:

`compute_amplitude`	Compute amplitude of a batch of waveforms.
`convolve1d`	Use torch.nn.functional to perform 1d padding and conv.
`dB_to_amplitude`	Returns the amplitude ratio, converted from decibels.
`gabor_impulse_response`	Function for generating gabor impulse responses as used by GaborConv1d proposed in
`gabor_impulse_response_legacy_complex`	Function for generating gabor impulse responses, but without using complex64 dtype as used by GaborConv1d proposed in
`normalize`	This function normalizes a signal to unitary average or peak amplitude.
`notch_filter`	Returns a notch filter constructed from a high-pass and low-pass filter.
`overlap_and_add`	Taken from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/utils.py
`rescale`	This functions performs signal rescaling to a target level.
`resynthesize`	Function for resynthesizing waveforms from enhanced mags.
`reverberate`	General function to contaminate a given signal with reverberation given a Room Impulse Response (RIR).

Reference

speechbrain.processing.signal_processing.compute_amplitude(waveforms, lengths=None, amp_type='avg', scale='linear')[source]

Compute amplitude of a batch of waveforms.

Parameters

waveform (tensor) – The waveforms used for computing amplitude. Shape should be [time] or [batch, time] or [batch, time, channels].
lengths (tensor) – The lengths of the waveforms excluding the padding. Shape should be a single dimension, [batch].
amp_type (str) – Whether to compute “avg” average or “peak” amplitude. Choose between [“avg”, “peak”].
scale (str) – Whether to compute amplitude in “dB” or “linear” scale. Choose between [“linear”, “dB”].

Return type

The average amplitude of the waveforms.

Example

>>> signal = torch.sin(torch.arange(16000.0)).unsqueeze(0)
>>> compute_amplitude(signal, signal.size(1))
tensor([[0.6366]])

speechbrain.processing.signal_processing.normalize(waveforms, lengths=None, amp_type='avg', eps=1e-14)[source]

This function normalizes a signal to unitary average or peak amplitude.

Parameters

waveforms (tensor) – The waveforms to normalize. Shape should be [batch, time] or [batch, time, channels].
lengths (tensor) – The lengths of the waveforms excluding the padding. Shape should be a single dimension, [batch].
amp_type (str) – Whether one wants to normalize with respect to “avg” or “peak” amplitude. Choose between [“avg”, “peak”]. Note: for “avg” clipping is not prevented and can occur.
eps (float) – A small number to add to the denominator to prevent NaN.

Returns

waveforms – Normalized level waveform.

Return type

tensor

speechbrain.processing.signal_processing.rescale(waveforms, lengths, target_lvl, amp_type='avg', scale='linear')[source]

This functions performs signal rescaling to a target level.

Parameters

waveforms (tensor) – The waveforms to normalize. Shape should be [batch, time] or [batch, time, channels].
lengths (tensor) – The lengths of the waveforms excluding the padding. Shape should be a single dimension, [batch].
target_lvl (float) – Target lvl in dB or linear scale.
amp_type (str) – Whether one wants to rescale with respect to “avg” or “peak” amplitude. Choose between [“avg”, “peak”].
scale (str) – whether target_lvl belongs to linear or dB scale. Choose between [“linear”, “dB”].

Returns

waveforms – Rescaled waveforms.

Return type

tensor

speechbrain.processing.signal_processing.convolve1d(waveform, kernel, padding=0, pad_type='constant', stride=1, groups=1, use_fft=False, rotation_index=0)[source]

Use torch.nn.functional to perform 1d padding and conv.

Parameters

waveform (tensor) – The tensor to perform operations on.
kernel (tensor) – The filter to apply during convolution.
padding (int or tuple) – The padding (pad_left, pad_right) to apply. If an integer is passed instead, this is passed to the conv1d function and pad_type is ignored.
pad_type (str) – The type of padding to use. Passed directly to torch.nn.functional.pad, see PyTorch documentation for available options.
stride (int) – The number of units to move each time convolution is applied. Passed to conv1d. Has no effect if use_fft is True.
groups (int) – This option is passed to conv1d to split the input into groups for convolution. Input channels should be divisible by the number of groups.
use_fft (bool) – When use_fft is passed True, then compute the convolution in the spectral domain using complex multiply. This is more efficient on CPU when the size of the kernel is large (e.g. reverberation). WARNING: Without padding, circular convolution occurs. This makes little difference in the case of reverberation, but may make more difference with different kernels.
rotation_index (int) – This option only applies if use_fft is true. If so, the kernel is rolled by this amount before convolution to shift the output location.

Return type

The convolved waveform.

Example

>>> from speechbrain.dataio.dataio import read_audio
>>> signal = read_audio('tests/samples/single-mic/example1.wav')
>>> signal = signal.unsqueeze(0).unsqueeze(2)
>>> kernel = torch.rand(1, 10, 1)
>>> signal = convolve1d(signal, kernel, padding=(9, 0))

speechbrain.processing.signal_processing.reverberate(waveforms, rir_waveform, rescale_amp='avg')[source]

General function to contaminate a given signal with reverberation given a Room Impulse Response (RIR). It performs convolution between RIR and signal, but without changing the original amplitude of the signal.

Parameters

waveforms (tensor) – The waveforms to normalize. Shape should be [batch, time] or [batch, time, channels].
rir_waveform (tensor) – RIR tensor, shape should be [time, channels].
rescale_amp (str) – Whether reverberated signal is rescaled (None) and with respect either to original signal “peak” amplitude or “avg” average amplitude. Choose between [None, “avg”, “peak”].

Returns

waveforms – Reverberated signal.

Return type

tensor

speechbrain.processing.signal_processing.dB_to_amplitude(SNR)[source]

Returns the amplitude ratio, converted from decibels.

Parameters: SNR (float) – The ratio in decibels to convert.

Example

>>> round(dB_to_amplitude(SNR=10), 3)
3.162
>>> dB_to_amplitude(SNR=0)
1.0

speechbrain.processing.signal_processing.notch_filter(notch_freq, filter_width=101, notch_width=0.05)[source]

Returns a notch filter constructed from a high-pass and low-pass filter.

(from https://tomroelandts.com/articles/ how-to-create-simple-band-pass-and-band-reject-filters)

Parameters

notch_freq (float) – frequency to put notch as a fraction of the sampling rate / 2. The range of possible inputs is 0 to 1.
filter_width (int) – Filter width in samples. Longer filters have smaller transition bands, but are more inefficient.
notch_width (float) – Width of the notch, as a fraction of the sampling_rate / 2.

Example

>>> from speechbrain.dataio.dataio import read_audio
>>> signal = read_audio('tests/samples/single-mic/example1.wav')
>>> signal = signal.unsqueeze(0).unsqueeze(2)
>>> kernel = notch_filter(0.25)
>>> notched_signal = convolve1d(signal, kernel)

speechbrain.processing.signal_processing.overlap_and_add(signal, frame_step)[source]

Taken from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/utils.py

Reconstructs a signal from a framed representation. Adds potentially overlapping frames of a signal with shape […, frames, frame_length], offsetting subsequent frames by frame_step. The resulting tensor has shape […, output_size] where

output_size = (frames - 1) * frame_step + frame_length

Args:: signal: A […, frames, frame_length] Tensor. All dimensions may be unknown, and rank must be at least 2. frame_step: An integer denoting overlap offsets. Must be less than or equal to frame_length.
Returns:: A Tensor with shape […, output_size] containing the overlap-added frames of signal’s inner-most two dimensions. output_size = (frames - 1) * frame_step + frame_length

Based on https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/contrib/signal/python/ops/reconstruction_ops.py

Example

>>> signal = torch.randn(5, 20)
>>> overlapped = overlap_and_add(signal, 20)
>>> overlapped.shape
torch.Size([100])

speechbrain.processing.signal_processing.resynthesize(enhanced_mag, noisy_inputs, stft, istft, normalize_wavs=True)[source]

Function for resynthesizing waveforms from enhanced mags.

Parameters

enhanced_mag (torch.Tensor) – Predicted spectral magnitude, should be three dimensional.
noisy_inputs (torch.Tensor) – The noisy waveforms before any processing, to extract phase.
lengths (torch.Tensor) – The length of each waveform for normalization.
stft (torch.nn.Module) – Module for computing the STFT for extracting phase.
istft (torch.nn.Module) – Module for computing the iSTFT for resynthesis.
normalize_wavs (bool) – Whether to normalize the output wavs before returning them.

Returns

enhanced_wav – The resynthesized waveforms of the enhanced magnitudes with noisy phase.

Return type

torch.Tensor

speechbrain.processing.signal_processing.gabor_impulse_response(t, center, fwhm)[source]

Function for generating gabor impulse responses as used by GaborConv1d proposed in

Neil Zeghidour, Olivier Teboul, F{‘e}lix de Chaumont Quitry & Marco Tagliasacchi, “LEAF: A LEARNABLE FRONTEND FOR AUDIO CLASSIFICATION”, in Proc of ICLR 2021 (https://arxiv.org/abs/2101.08596)

speechbrain.processing.signal_processing.gabor_impulse_response_legacy_complex(t, center, fwhm)[source]

Function for generating gabor impulse responses, but without using complex64 dtype as used by GaborConv1d proposed in

Neil Zeghidour, Olivier Teboul, F{‘e}lix de Chaumont Quitry & Marco Tagliasacchi, “LEAF: A LEARNABLE FRONTEND FOR AUDIO CLASSIFICATION”, in Proc of ICLR 2021 (https://arxiv.org/abs/2101.08596)