speechbrain.augment.codec module
Codec Augmentation via torchaudio
This library provides codec augmentation techniques in torchaudio for enhanced audio data processing.
For detailed guidance and usage examples, refer to the tutorial at: https://pytorch.org/audio/stable/tutorials/audio_data_augmentation_tutorial.html
Note: This code is compatible with FFmpeg as the torchaudio backend. When using FFmpeg2, the maximum number of samples for processing is limited to 16.
- Authors
Mirco Ravanelli 2023
Summary
Classes:
Apply random audio codecs to input waveforms using torchaudio. |
Reference
- class speechbrain.augment.codec.CodecAugment(sample_rate=16000)[source]
Bases:
Module
Apply random audio codecs to input waveforms using torchaudio.
This class provides an interface for applying codec augmentation techniques to audio data.
- Parameters:
sample_rate (int) – The sample rate of the input waveform.
Example
>>> waveform = torch.rand(4, 16000) >>> if torchaudio.list_audio_backends()[0] == 'ffmpeg': ... augmenter = CodecAugment(16000) ... output_waveform = augmenter(waveform)
- apply_codec(waveform, format=None, encoder=None)[source]
Apply the selected audio codec.
- Parameters:
waveform (torch.Tensor) – Input waveform of shape
[batch, time]
.format (str) – The audio format to use (e.g., “wav”, “mp3”). Default is None.
encoder (str) – The encoder to use for the format (e.g., “opus”, “vorbis”). Default is None.
- Returns:
Coded version of the input waveform of shape
[batch, time]
.- Return type:
- forward(waveform)[source]
Apply a random audio codec from the available list.
- Parameters:
waveform (torch.Tensor) – Input waveform of shape
[batch, time]
.- Returns:
Coded version of the input waveform of shape
[batch, time]
.- Return type: