speechbrain.processing.signal_processing module
Low level signal processing utilities
- Authors
Peter Plantinga 2020
Francois Grondin 2020
William Aris 2020
Samuele Cornell 2020
Sarthak Yadav 2022
Summary
Functions:
Compute amplitude of a batch of waveforms. |
|
Use torch.nn.functional to perform 1d padding and conv. |
|
Returns the amplitude ratio, converted from decibels. |
|
Function for generating gabor impulse responses as used by GaborConv1d proposed in |
|
Function for generating gabor impulse responses, but without using complex64 dtype as used by GaborConv1d proposed in |
|
This function normalizes the mean and std of the input |
|
This function normalizes a signal to unitary average or peak amplitude. |
|
Returns a notch filter constructed from a high-pass and low-pass filter. |
|
Taken from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/utils.py |
|
This functions performs signal rescaling to a target level. |
|
Function for resynthesizing waveforms from enhanced mags. |
|
General function to contaminate a given signal with reverberation given a Room Impulse Response (RIR). |
Reference
- speechbrain.processing.signal_processing.compute_amplitude(waveforms, lengths=None, amp_type='avg', scale='linear')[source]
Compute amplitude of a batch of waveforms.
- Parameters:
waveform (tensor) – The waveforms used for computing amplitude. Shape should be
[time]
or[batch, time]
or[batch, time, channels]
.lengths (tensor) – The lengths of the waveforms excluding the padding. Shape should be a single dimension,
[batch]
.amp_type (str) – Whether to compute “avg” average or “peak” amplitude. Choose between [“avg”, “peak”].
scale (str) – Whether to compute amplitude in “dB” or “linear” scale. Choose between [“linear”, “dB”].
- Return type:
The average amplitude of the waveforms.
Example
>>> signal = torch.sin(torch.arange(16000.0)).unsqueeze(0) >>> compute_amplitude(signal, signal.size(1)) tensor([[0.6366]])
- speechbrain.processing.signal_processing.normalize(waveforms, lengths=None, amp_type='avg', eps=1e-14)[source]
This function normalizes a signal to unitary average or peak amplitude.
- Parameters:
waveforms (tensor) – The waveforms to normalize. Shape should be
[batch, time]
or[batch, time, channels]
.lengths (tensor) – The lengths of the waveforms excluding the padding. Shape should be a single dimension,
[batch]
.amp_type (str) – Whether one wants to normalize with respect to “avg” or “peak” amplitude. Choose between [“avg”, “peak”]. Note: for “avg” clipping is not prevented and can occur.
eps (float) – A small number to add to the denominator to prevent NaN.
- Returns:
waveforms – Normalized level waveform.
- Return type:
tensor
- speechbrain.processing.signal_processing.mean_std_norm(waveforms, dims=1, eps=1e-06)[source]
- This function normalizes the mean and std of the input
waveform (along the specified axis).
- Parameters:
- Returns:
waveforms – Normalized level waveform.
- Return type:
tensor
- speechbrain.processing.signal_processing.rescale(waveforms, lengths, target_lvl, amp_type='avg', scale='linear')[source]
This functions performs signal rescaling to a target level.
- Parameters:
waveforms (tensor) – The waveforms to normalize. Shape should be
[batch, time]
or[batch, time, channels]
.lengths (tensor) – The lengths of the waveforms excluding the padding. Shape should be a single dimension,
[batch]
.target_lvl (float) – Target lvl in dB or linear scale.
amp_type (str) – Whether one wants to rescale with respect to “avg” or “peak” amplitude. Choose between [“avg”, “peak”].
scale (str) – whether target_lvl belongs to linear or dB scale. Choose between [“linear”, “dB”].
- Returns:
waveforms – Rescaled waveforms.
- Return type:
tensor
- speechbrain.processing.signal_processing.convolve1d(waveform, kernel, padding=0, pad_type='constant', stride=1, groups=1, use_fft=False, rotation_index=0)[source]
Use torch.nn.functional to perform 1d padding and conv.
- Parameters:
waveform (tensor) – The tensor to perform operations on.
kernel (tensor) – The filter to apply during convolution.
padding (int or tuple) – The padding (pad_left, pad_right) to apply. If an integer is passed instead, this is passed to the conv1d function and pad_type is ignored.
pad_type (str) – The type of padding to use. Passed directly to
torch.nn.functional.pad
, see PyTorch documentation for available options.stride (int) – The number of units to move each time convolution is applied. Passed to conv1d. Has no effect if
use_fft
is True.groups (int) – This option is passed to
conv1d
to split the input into groups for convolution. Input channels should be divisible by the number of groups.use_fft (bool) – When
use_fft
is passedTrue
, then compute the convolution in the spectral domain using complex multiply. This is more efficient on CPU when the size of the kernel is large (e.g. reverberation). WARNING: Without padding, circular convolution occurs. This makes little difference in the case of reverberation, but may make more difference with different kernels.rotation_index (int) – This option only applies if
use_fft
is true. If so, the kernel is rolled by this amount before convolution to shift the output location.
- Return type:
The convolved waveform.
Example
>>> from speechbrain.dataio.dataio import read_audio >>> signal = read_audio('tests/samples/single-mic/example1.wav') >>> signal = signal.unsqueeze(0).unsqueeze(2) >>> kernel = torch.rand(1, 10, 1) >>> signal = convolve1d(signal, kernel, padding=(9, 0))
- speechbrain.processing.signal_processing.reverberate(waveforms, rir_waveform, rescale_amp='avg')[source]
General function to contaminate a given signal with reverberation given a Room Impulse Response (RIR). It performs convolution between RIR and signal, but without changing the original amplitude of the signal.
- Parameters:
waveforms (tensor) – The waveforms to normalize. Shape should be
[batch, time]
or[batch, time, channels]
.rir_waveform (tensor) – RIR tensor, shape should be [time, channels].
rescale_amp (str) – Whether reverberated signal is rescaled (None) and with respect either to original signal “peak” amplitude or “avg” average amplitude. Choose between [None, “avg”, “peak”].
- Returns:
waveforms – Reverberated signal.
- Return type:
tensor
- speechbrain.processing.signal_processing.dB_to_amplitude(SNR)[source]
Returns the amplitude ratio, converted from decibels.
- Parameters:
SNR (float) – The ratio in decibels to convert.
Example
>>> round(dB_to_amplitude(SNR=10), 3) 3.162 >>> dB_to_amplitude(SNR=0) 1.0
- speechbrain.processing.signal_processing.notch_filter(notch_freq, filter_width=101, notch_width=0.05)[source]
Returns a notch filter constructed from a high-pass and low-pass filter.
(from https://tomroelandts.com/articles/ how-to-create-simple-band-pass-and-band-reject-filters)
- Parameters:
notch_freq (float) – frequency to put notch as a fraction of the sampling rate / 2. The range of possible inputs is 0 to 1.
filter_width (int) – Filter width in samples. Longer filters have smaller transition bands, but are more inefficient.
notch_width (float) – Width of the notch, as a fraction of the sampling_rate / 2.
Example
>>> from speechbrain.dataio.dataio import read_audio >>> signal = read_audio('tests/samples/single-mic/example1.wav') >>> signal = signal.unsqueeze(0).unsqueeze(2) >>> kernel = notch_filter(0.25) >>> notched_signal = convolve1d(signal, kernel)
- speechbrain.processing.signal_processing.overlap_and_add(signal, frame_step)[source]
Taken from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/utils.py
Reconstructs a signal from a framed representation. Adds potentially overlapping frames of a signal with shape
[..., frames, frame_length]
, offsetting subsequent frames byframe_step
. The resulting tensor has shape[..., output_size]
whereoutput_size = (frames - 1) * frame_step + frame_length
- Args:
signal: A […, frames, frame_length] Tensor. All dimensions may be unknown, and rank must be at least 2. frame_step: An integer denoting overlap offsets. Must be less than or equal to frame_length.
- Returns:
A Tensor with shape […, output_size] containing the overlap-added frames of signal’s inner-most two dimensions. output_size = (frames - 1) * frame_step + frame_length
Example
>>> signal = torch.randn(5, 20) >>> overlapped = overlap_and_add(signal, 20) >>> overlapped.shape torch.Size([100])
- speechbrain.processing.signal_processing.resynthesize(enhanced_mag, noisy_inputs, stft, istft, normalize_wavs=True)[source]
Function for resynthesizing waveforms from enhanced mags.
- Parameters:
enhanced_mag (torch.Tensor) – Predicted spectral magnitude, should be three dimensional.
noisy_inputs (torch.Tensor) – The noisy waveforms before any processing, to extract phase.
lengths (torch.Tensor) – The length of each waveform for normalization.
stft (torch.nn.Module) – Module for computing the STFT for extracting phase.
istft (torch.nn.Module) – Module for computing the iSTFT for resynthesis.
normalize_wavs (bool) – Whether to normalize the output wavs before returning them.
- Returns:
enhanced_wav – The resynthesized waveforms of the enhanced magnitudes with noisy phase.
- Return type:
- speechbrain.processing.signal_processing.gabor_impulse_response(t, center, fwhm)[source]
Function for generating gabor impulse responses as used by GaborConv1d proposed in
Neil Zeghidour, Olivier Teboul, F{‘e}lix de Chaumont Quitry & Marco Tagliasacchi, “LEAF: A LEARNABLE FRONTEND FOR AUDIO CLASSIFICATION”, in Proc of ICLR 2021 (https://arxiv.org/abs/2101.08596)
- speechbrain.processing.signal_processing.gabor_impulse_response_legacy_complex(t, center, fwhm)[source]
Function for generating gabor impulse responses, but without using complex64 dtype as used by GaborConv1d proposed in
Neil Zeghidour, Olivier Teboul, F{‘e}lix de Chaumont Quitry & Marco Tagliasacchi, “LEAF: A LEARNABLE FRONTEND FOR AUDIO CLASSIFICATION”, in Proc of ICLR 2021 (https://arxiv.org/abs/2101.08596)