speechbrain.augment.augmenter module

Classes for implementing data augmentation pipelines.

Authors

Mirco Ravanelli 2022

Summary

Classes:

Augmenter

Applies pipelines of data augmentation.

Reference

class speechbrain.augment.augmenter.Augmenter(parallel_augment=False, parallel_augment_fixed_bs=False, concat_original=False, min_augmentations=None, max_augmentations=None, shuffle_augmentations=False, repeat_augment=1, augment_start_index=0, augment_end_index=None, concat_start_index=0, concat_end_index=None, augment_prob=1.0, augmentations=[], enable_augmentations=None)[source]

Bases: Module

Applies pipelines of data augmentation.

Parameters:

parallel_augment (bool) – If False, the augmentations are applied sequentially with the order specified in the pipeline argument. When True, all the N augmentations are concatenated in the output on the batch axis.
parallel_augment_fixed_bs (bool) – If False, each augmenter (performed in parallel) generates a number of augmented examples equal to the batch size. Thus, overall, with this option N*batch size artificial data are generated, where N is the number of augmenters. When True, the number of total augmented examples is kept fixed at the batch size, thus, for each augmenter, fixed at batch size // N examples. This option is useful to keep controlled the number of synthetic examples with respect to the original data distribution, as it keep always 50% of original data, and 50% of augmented data.
concat_original (bool) – if True, the original input is concatenated with the augmented outputs (on the batch axis).
min_augmentations (int) – The number of augmentations applied to the input signal is randomly sampled between min_augmentations and max_augmentations. For instance, if the augmentation dict contains N=6 augmentations and we set select min_augmentations=1 and max_augmentations=4 we apply up to M=4 augmentations. The selected augmentations are applied in the order specified in the augmentations dict. If shuffle_augmentations = True, a random set of M augmentations is selected.
max_augmentations (int) – Maximum number of augmentations to apply. See min_augmentations for more details.
shuffle_augmentations (bool) – If True, it shuffles the entries of the augmentations dictionary. The effect is to randomply select the order of the augmentations to apply.
repeat_augment (int) – Applies the augmentation algorithm N times. This can be used to perform more data augmentation.
augment_start_index (int) – The index of the first element in the input batch from which data augmentation should begin. This argument allows you to specify the starting point for applying data augmentation.
augment_end_index (int) – The index of the last element in the input batch at which data augmentation should stop. You can use this argument to define the endpoint for applying data augmentation within the batch.
concat_start_index (int) – If concat_original is set to True, you can specify a subpart of the original batch to concatenate in the output. Use this argument to select the index of the first element from the original input batch to start copying from.
concat_end_index (int) – If concat_original is set to True, you can specify a subpart of the original batch to concatenate in the output. Use this argument to select the index of the last element from the original input batch to end the copying process.
augment_prob (float) – The probability (0.0 to 1.0) of applying data augmentation. When set to 0.0, the original signal is returned without any augmentation. When set to 1.0, augmentation is always applied. Values in between determine the likelihood of augmentation.
augmentations (list) – List of augmentater objects to combine to perform data augmentation.
enable_augmentations (list) – A list of booleans used to selectively enable or disable specific augmentation techniques within the ‘augmentations’ list. Each boolean corresponds to an augmentation object in the ‘augmentations’ list and should be of the same length and order. This feature is useful for performing ablations on augmentation techniques to tailor them for a specific task.

Example

>>> from speechbrain.augment.time_domain import DropFreq, DropChunk
>>> freq_dropper = DropFreq()
>>> chunk_dropper = DropChunk(drop_start=100, drop_end=16000)
>>> augment = Augmenter(parallel_augment=False, concat_original=False, augmentations=[freq_dropper, chunk_dropper])
>>> signal = torch.rand([4, 16000])
>>> output_signal, lenghts = augment(signal, lengths=torch.tensor([0.2,0.5,0.7,1.0]))

augment(x, lengths, selected_augmentations)[source]

Applies data augmentation on the seleted augmentations.

Parameters:

x (torch.Tensor (batch, time, channel)) – input to augment.
lengths (torch.Tensor) – The length of each sequence in the batch.
selected_augmentations (dict) – Dictionary containg the selected augmentation to apply.

forward(x, lengths)[source]

Applies data augmentation.

Parameters:

x (torch.Tensor (batch, time, channel)) – input to augment.
lengths (torch.Tensor) – The length of each sequence in the batch.

concatenate_outputs(augment_lst, augment_len_lst)[source]

Concatenate a list of augmented signals, accounting for varying temporal lengths. Padding is applied to ensure all signals can be concatenated.

Parameters:

augmentations (List of torch.Tensor) – List of augmented signals to be concatenated.
augmentation_lengths (List of torch.Tensor) – List of lengths corresponding to the augmented signals.

Returns:

concatenated_signals (torch.Tensor) – A tensor containing the concatenated signals.
concatenated_lengths (torch.Tensor) – A tensor containing the concatenated signal lengths.

Notes

This function takes a list of augmented signals, which may have different temporal lengths due to variations such as speed changes. It pads the signals to match the maximum temporal dimension found among the input signals and rescales the lengths accordingly before concatenating them.

replicate_multiple_labels(*args)[source]

Replicates the labels along the batch axis a number of times that corresponds to the number of augmentations. Indeed parallel and concatenation augmentations alter the time dimension.

Parameters:: args (torch.Tensor) – Input label tensors to be replicated. Can be a uniq or a list of Tensors.
Returns:: augmented_labels – Labels corresponding to the augmented input. Returns as many Tensor as given in input.
Return type:: torch.Tensor

training: bool

replicate_labels(labels)[source]

Replicates the labels along the batch axis a number of times that corresponds to the number of augmentations. Indeed parallel and concatenation augmentations alter the time dimension.

Parameters:: labels (torch.Tensor) – Input label tensors to be replicated.
Returns:: augmented_labels – Labels corresponding to the augmented input. Returns as many Tensor as given in input.
Return type:: torch.Tensor

check_min_max_augmentations()[source]: Checks the min_augmentations and max_augmentations arguments.