speechbrain.dataio.preprocess module

Preprocessors for audio

Summary

Classes:

AudioNormalizer

Normalizes audio into a standard format

Reference

class speechbrain.dataio.preprocess.AudioNormalizer(sample_rate=16000, mix='avg-to-mono')[source]

Bases: object

Normalizes audio into a standard format

Parameters:
  • sample_rate (int) – The sampling rate to which the incoming signals should be converted.

  • mix ({"avg-to-mono", "keep"}) – “avg-to-mono” - add all channels together and normalize by number of channels. This also removes the channel dimension, resulting in [time] format tensor. “keep” - don’t normalize channel information

Example

>>> import torchaudio
>>> example_file = 'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac'
>>> signal, sr = torchaudio.load(example_file, channels_first = False)
>>> normalizer = AudioNormalizer(sample_rate=8000)
>>> normalized = normalizer(signal, sr)
>>> signal.shape
torch.Size([160000, 4])
>>> normalized.shape
torch.Size([80000])

Note

This will also upsample audio. However, upsampling cannot produce meaningful information in the bandwidth which it adds. Generally models will not work well for upsampled data if they have not specifically been trained to do so.

__call__(audio, sample_rate)[source]

Perform normalization

Parameters:
  • audio (torch.Tensor) – The input waveform torch tensor. Assuming [time, channels], or [time].

  • sample_rate (int) – Rate the audio was sampled at.

Returns:

audio – Channel- and sample-rate-normalized audio.

Return type:

torch.Tensor