speechbrain.augment.codec module

Codec Augmentation via torchaudio

This library provides codec augmentation techniques in torchaudio for enhanced audio data processing.

For detailed guidance and usage examples, refer to the tutorial at: https://pytorch.org/audio/stable/tutorials/audio_data_augmentation_tutorial.html

Note: This code is compatible with FFmpeg as the torchaudio backend. When using FFmpeg2, the maximum number of samples for processing is limited to 16.

Authors
  • Mirco Ravanelli 2023

Summary

Classes:

CodecAugment

Apply random audio codecs to input waveforms using torchaudio.

Reference

class speechbrain.augment.codec.CodecAugment(sample_rate=16000)[source]

Bases: Module

Apply random audio codecs to input waveforms using torchaudio.

This class provides an interface for applying codec augmentation techniques to audio data.

Parameters:

sample_rate (int) – The sample rate of the input waveform.

Example

>>> waveform = torch.rand(4, 16000)
>>> if torchaudio.list_audio_backends()[0] == 'ffmpeg':
...     augmenter = CodecAugment(16000)
...     output_waveform = augmenter(waveform)
apply_codec(waveform, format=None, encoder=None)[source]

Apply the selected audio codec.

Parameters:
  • waveform (torch.Tensor) – Input waveform of shape [batch, time].

  • format (str) – The audio format to use (e.g., “wav”, “mp3”). Default is None.

  • encoder (str) – The encoder to use for the format (e.g., “opus”, “vorbis”). Default is None.

Returns:

Coded version of the input waveform of shape [batch, time].

Return type:

torch.Tensor

forward(waveform)[source]

Apply a random audio codec from the available list.

Parameters:

waveform (torch.Tensor) – Input waveform of shape [batch, time].

Returns:

Coded version of the input waveform of shape [batch, time].

Return type:

torch.Tensor

training: bool