speechbrain.dataio.preprocess module¶
Preprocessors for audio
Reference¶
- class speechbrain.dataio.preprocess.AudioNormalizer(sample_rate=16000, mix='avg-to-mono')[source]¶
Bases:
object
Normalizes audio into a standard format
- Parameters
sample_rate (int) – The sampling rate to which the incoming signals should be converted.
mix ({"avg-to-mono", "keep"}) – “avg-to-mono” - add all channels together and normalize by number of channels. This also removes the channel dimension, resulting in [time] format tensor. “keep” - don’t normalize channel information
Example
>>> import torchaudio >>> example_file = 'samples/audio_samples/example_multichannel.wav' >>> signal, sr = torchaudio.load(example_file, channels_first = False) >>> normalizer = AudioNormalizer(sample_rate=8000) >>> normalized = normalizer(signal, sr) >>> signal.shape torch.Size([33882, 2]) >>> normalized.shape torch.Size([16941])
Note
This will also upsample audio. However, upsampling cannot produce meaningful information in the bandwidth which it adds. Generally models will not work well for upsampled data if they have not specifically been trained to do so.