speechbrain.utils.data_utils module

This library gathers utilities for data io operation.

Authors

Mirco Ravanelli 2020
Aku Rouhe 2020
Samuele Cornell 2020
Adel Moumen 2024
Pierre Champion 2023

Summary

Functions:

`batch_pad_right`	Given a list of torch tensors it batches them together by padding to the right on each dimension in order to get same length for all.
`batch_shuffle`	Shuffles batches of fixed size within a sequence
`concat_padded_features`	Concatenates multiple padded feature tensors into a single padded tensor in a vectorized manner without including the padding in the final tensor, adding padding only at the end.
`dict_value_combinations`	Returns all possible key-value combinations from the given dictionary
`dict_value_combinations_gen`	Returns a generation of permutations of the specified values dictionary
`dist_stats`	Returns standard distribution statistics (mean, std, min, max)
`download_file`	Downloads the file from the given source and saves it in the given destination path.
`get_all_files`	Returns a list of files found within a folder.
`get_list_from_csv`	Gets a list from the selected field of the input csv file.
`length_range`	Creates a tensor with a range in a single dimension to one matching the shape of a its tensor
`masked_max`	A metric function that computes the minimum of each sample
`masked_mean`	A metric function that computes the mean of each sample, excluding padding
`masked_min`	A metric function that computes the minimum of each sample
`masked_std`	A metric function that computes the standard deviation of each sample, excluding padding
`match_shape`	A swiss-army-knife helper function to match the shape of a tensor to match that of another tensor - useful for masks, etc.
`mod_default_collate`	Makes a tensor from list of batch values.
`non_batch_dims`	Returns all dimensons of the specified tensor except the batch dimension
`pad_divisible`	Adds extra padding to the specified dimension of a tensor to make it divisible by the specified factor.
`pad_right_to`	This function takes a torch tensor of arbitrary shape and pads it to target shape by appending values on the right.
`recursive_items`	Yield each (key, value) of a nested dictionary.
`recursive_to`	Moves data to device, or other type, and handles containers.
`recursive_update`	Similar function to `dict.update`, but for a nested `dict`.
`scalarize`	Converts a namedtuple or dictionary containing tensors to their scalar value Arguments: ---------- value: dict or namedtuple a dictionary or named tuple of tensors :returns: result -- a result dictionary :rtype: dict
`set_writing_permissions`	This function sets user writing permissions to all the files in the given folder.
`split_by_whitespace`	A very basic functional version of str.split
`split_list`	Returns a list of splits in the sequence.
`split_path`	Splits a path to source and filename
`trim_as`	Trims the specified tensor to match the shape of another tensor (at most)
`trim_to_shape`	Trims the specified tensor to match the specified shape
`undo_padding`	Produces Python lists given a batch of sentences with their corresponding relative lengths.
`unsqueeze_1d`	Unsqueezes a 1-D tensor to the specified number of dimension preserving one dimension and creating "dummy" dimensions elsewhere
`unsqueeze_as`	Reshape the tensor to be of a shape compatible with the target tensor, only valid if x.dim() <= y.dim()

Reference

speechbrain.utils.data_utils.undo_padding(batch, lengths)[source]

Produces Python lists given a batch of sentences with their corresponding relative lengths.

Parameters:

batch (tensor) – Batch of sentences gathered in a batch.
lengths (tensor) – Relative length of each sentence in the batch.

Example

>>> batch=torch.rand([4,100])
>>> lengths=torch.tensor([0.5,0.6,0.7,1.0])
>>> snt_list=undo_padding(batch, lengths)
>>> len(snt_list)
4

speechbrain.utils.data_utils.get_all_files(dirName, match_and=None, match_or=None, exclude_and=None, exclude_or=None)[source]

Returns a list of files found within a folder.

Different options can be used to restrict the search to some specific patterns.

Parameters:

dirName (str) – The directory to search.
match_and (list) – A list that contains patterns to match. The file is returned if it matches all the entries in match_and.
match_or (list) – A list that contains patterns to match. The file is returned if it matches one or more of the entries in match_or.
exclude_and (list) – A list that contains patterns to match. The file is returned if it matches none of the entries in exclude_and.
exclude_or (list) – A list that contains pattern to match. The file is returned if it fails to match one of the entries in exclude_or.

Example

>>> get_all_files('tests/samples/RIRs', match_and=['3.wav'])
['tests/samples/RIRs/rir3.wav']

speechbrain.utils.data_utils.get_list_from_csv(csvfile, field, delimiter=',', skipinitialspace=True)[source]

Gets a list from the selected field of the input csv file.

Parameters:

csv_file (path) – Path to the csv file.
field (str) – Field of the csv file used to create the list.
delimiter (str) – Delimiter of the csv file.
skipinitialspace (bool) – Set it to true to skip initial spaces in the entries.

speechbrain.utils.data_utils.split_list(seq, num)[source]

Returns a list of splits in the sequence.

Parameters:

seq (iterable) – The input list, to be split.
num (int) – The number of chunks to produce.

Example

>>> split_list([1, 2, 3, 4, 5, 6, 7, 8, 9], 4)
[[1, 2], [3, 4], [5, 6], [7, 8, 9]]

speechbrain.utils.data_utils.recursive_items(dictionary)[source]

Yield each (key, value) of a nested dictionary.

Parameters:: dictionary (dict) – The nested dictionary to list.
Yields:: (key, value) tuples from the dictionary.

Example

>>> rec_dict={'lev1': {'lev2': {'lev3': 'current_val'}}}
>>> [item for item in recursive_items(rec_dict)]
[('lev3', 'current_val')]

speechbrain.utils.data_utils.recursive_update(d, u, must_match=False)[source]

Similar function to dict.update, but for a nested dict.

From: https://stackoverflow.com/a/3233356

If you have to a nested mapping structure, for example:

{“a”: 1, “b”: {“c”: 2}}

Say you want to update the above structure with:

{“b”: {“d”: 3}}

This function will produce:

{“a”: 1, “b”: {“c”: 2, “d”: 3}}

Instead of:

{“a”: 1, “b”: {“d”: 3}}

Parameters:

d (dict) – Mapping to be updated.
u (dict) – Mapping to update with.
must_match (bool) – Whether to throw an error if the key in u does not exist in d.

Example

>>> d = {'a': 1, 'b': {'c': 2}}
>>> recursive_update(d, {'b': {'d': 3}})
>>> d
{'a': 1, 'b': {'c': 2, 'd': 3}}

speechbrain.utils.data_utils.download_file(source, dest, unpack=False, dest_unpack=None, replace_existing=False, write_permissions=False)[source]

Downloads the file from the given source and saves it in the given destination path.

Arguments

the web.

destpath: Destination path.
unpackbool: If True, it unpacks the data in the dest folder.
dest_unpack: path: Path where to store the unpacked dataset
replace_existingbool: If True, replaces the existing files.
write_permissions: bool: When set to True, all the files in the dest_unpack directory will be granted write permissions. This option is active only when unpack=True.

speechbrain.utils.data_utils.set_writing_permissions(folder_path)[source]

This function sets user writing permissions to all the files in the given folder.

Parameters:: folder_path (folder) – Folder whose files will be granted write permissions.

speechbrain.utils.data_utils.pad_right_to(tensor: Tensor, target_shape, mode='constant', value=0)[source]

This function takes a torch tensor of arbitrary shape and pads it to target shape by appending values on the right.

Parameters:

tensor (input torch tensor) – Input tensor whose dimension we need to pad.
target_shape ((list, tuple)) – Target shape we want for the target tensor its len must be equal to tensor.ndim
mode (str) – Pad mode, please refer to torch.nn.functional.pad documentation.
value (float) – Pad value, please refer to torch.nn.functional.pad documentation.

Returns:

tensor (torch.Tensor) – Padded tensor.
valid_vals (list) – List containing proportion for each dimension of original, non-padded values.

speechbrain.utils.data_utils.batch_pad_right(tensors: list, mode='constant', value=0)[source]

Given a list of torch tensors it batches them together by padding to the right on each dimension in order to get same length for all.

Parameters:

tensors (list) – List of tensor we wish to pad together.
mode (str) – Padding mode see torch.nn.functional.pad documentation.
value (float) – Padding value see torch.nn.functional.pad documentation.

Returns:

tensor (torch.Tensor) – Padded tensor.
valid_vals (list) – List containing proportion for each dimension of original, non-padded values.

speechbrain.utils.data_utils.split_by_whitespace(text)[source]: A very basic functional version of str.split

speechbrain.utils.data_utils.recursive_to(data, *args, **kwargs)[source]

Moves data to device, or other type, and handles containers.

Very similar to torch.utils.data._utils.pin_memory.pin_memory, but applies .to() instead.

speechbrain.utils.data_utils.mod_default_collate(batch)[source]

Makes a tensor from list of batch values.

Note that this doesn’t need to zip(*) values together as PaddedBatch connects them already (by key).

Here the idea is not to error out.

This is modified from: https://github.com/pytorch/pytorch/blob/c0deb231db76dbea8a9d326401417f7d1ce96ed5/torch/utils/data/_utils/collate.py#L42

speechbrain.utils.data_utils.split_path(path)[source]

Splits a path to source and filename

This also handles URLs and Huggingface hub paths, in addition to regular paths.

Parameters:

path (str or FetchSource) –

Returns:

str – Source
str – Filename

speechbrain.utils.data_utils.scalarize(value)[source]

Converts a namedtuple or dictionary containing tensors to their scalar value Arguments: ———- value: dict or namedtuple

a dictionary or named tuple of tensors

Returns:: result – a result dictionary
Return type:: dict

speechbrain.utils.data_utils.unsqueeze_as(x, target)[source]

Reshape the tensor to be of a shape compatible with the target tensor, only valid if x.dim() <= y.dim()

Parameters:

x (torch.Tensor) – the original tensor
target (torch.Tensor) – the tensor whose shape

Returns:

result – a view of tensor x reshaped to a shape compatible with y

Return type:

torch.Tensor

speechbrain.utils.data_utils.pad_divisible(tensor, length=None, factor=2, len_dim=1, pad_value=0)[source]

Adds extra padding to the specified dimension of a tensor to make it divisible by the specified factor. This is useful when passing variable-length sequences to downsampling UNets or other similar architectures in which inputs are expected to be divisible by the downsampling factor

Parameters:

tensor (torch.Tensor) – the tensor to be padded, of arbitrary dimension
length (torch.Tensor) – a 1-D tensor of relative lengths
factor (int) – the divisibility factor
len_dim (int) – the index of the dimension used as the length
pad_value (int) – the value with which outputs will be padded

Returns:

tensor_padded (torch.Tensor) – the tensor, with additional padding if required
length (torch.Tensor) – the adjsted length tensor, if provided

Example

>>> x = torch.tensor([[1, 2, 3, 4],
...                   [5, 6, 0, 0]])
>>> lens = torch.tensor([1., .5])
>>> x_pad, lens_pad = pad_divisible(x, length=lens, factor=5)
>>> x_pad
tensor([[1, 2, 3, 4, 0],
        [5, 6, 0, 0, 0]])
>>> lens_pad
tensor([0.8000, 0.4000])

speechbrain.utils.data_utils.trim_to_shape(tensor, shape)[source]

Trims the specified tensor to match the specified shape

Parameters:

tensor (torch.Tensor) – a tensor
shape (enumerable) – the desired shape

Returns:

tensor – the trimmed tensor

Return type:

torch.Tensor

speechbrain.utils.data_utils.trim_as(tensor, other)[source]

Trims the specified tensor to match the shape of another tensor (at most)

Parameters:

tensor (torch.Tensor:) – a tensor
other (torch.Tensor) – the tensor whose shape to match

Returns:

tensor – the trimmed tensor

Return type:

torch.Tensor

speechbrain.utils.data_utils.match_shape(tensor, other)[source]

A swiss-army-knife helper function to match the shape of a tensor to match that of another tensor - useful for masks, etc.

Parameters:

tensor (torch.Tensor:) – a tensor
other (torch.Tensor) – the tensor whose shape to match

Returns:

tensor – the tensor with matching shape

Return type:

torch.Tensor

speechbrain.utils.data_utils.batch_shuffle(items, batch_size)[source]

Shuffles batches of fixed size within a sequence

Parameters:

items (sequence) – a tensor or an indexable sequence, such as a list
batch_size (int) – the batch size

Returns:

items – the original items. If a tensor was passed, a tensor will be returned. Otherwise, it will return a list

Return type:

sequence

speechbrain.utils.data_utils.concat_padded_features(feats, lens, dim=1, feats_slice_start=None, feats_slice_end=None)[source]

Concatenates multiple padded feature tensors into a single padded tensor in a vectorized manner without including the padding in the final tensor, adding padding only at the end. The function supports optional relative sicing of the tensors.

One possible use case is to concatenate batches of spectrograms or audio.

Parameters:

feats (list) – a list of padded tesnors
lens (list) – a list of length tensors
feats_slice_start (list) – offsets, relative to the beginning of the sequence, for each of the tensors being concatenated. This is useful if only a subsequence of some slices is included
feats_slice_end (list) – offsets, relative to the end of the sequence, for each of the tensors being concatenated. This is useful if only a subsequence of some slices is included

Returns:

out – a concatenated tensor

Return type:

torch.Tensor

speechbrain.utils.data_utils.unsqueeze_1d(value, dim, value_dim)[source]

Unsqueezes a 1-D tensor to the specified number of dimension preserving one dimension and creating “dummy” dimensions elsewhere

Parameters:

value (torch.Tensor) – A 1-D tensor
dim (int) – the number of dimension
value_dim (int) – the dimension that the value tensor represents

Returns:

result – a dim-dimensional tensor

Return type:

torch.Tensor

speechbrain.utils.data_utils.length_range(feats, len_dim)[source]

Creates a tensor with a range in a single dimension to one matching the shape of a its tensor

Parameters:

feats (torch.Tensor) – a features tensor of arbitrary shape
len_dim (torch.Tensor) – the dimension used as length

Returns:

result – a tensor matching the shape of feats with an 0 to max-length range along the length dimension repeated across other dimensions

Return type:

torch.Tensor

speechbrain.utils.data_utils.non_batch_dims(sample)[source]

Returns all dimensons of the specified tensor except the batch dimension

Parameters:: sample (torch.Tensor) – an arbitrary tensor
Returns:: dims – a list of dimensions
Return type:: list

speechbrain.utils.data_utils.masked_mean(sample, mask=None)[source]

A metric function that computes the mean of each sample, excluding padding

Parameters:

samples (torch.Tensor) – a tensor of spectrograms
mask (torch.Tensor) – a length mask

Returns:

result – a tensor fo means

Return type:

torch.Tensor

speechbrain.utils.data_utils.masked_std(sample, mask=None)[source]

A metric function that computes the standard deviation of each sample, excluding padding

Parameters:

samples (torch.Tensor) – a tensor of spectrograms
mask (torch.Tensor) – a length mask

Returns:

result – a tensor fo means

Return type:

torch.Tensor

speechbrain.utils.data_utils.masked_min(sample, mask=None)[source]

A metric function that computes the minimum of each sample

Parameters:

samples (torch.Tensor) – a tensor of spectrograms
mask (torch.Tensor) – a length mask

Returns:

result – a tensor fo means

Return type:

torch.Tensor

speechbrain.utils.data_utils.masked_max(sample, mask=None)[source]

A metric function that computes the minimum of each sample

Parameters:

samples (torch.Tensor) – a tensor of spectrograms
mask (torch.Tensor) – a length mask

Returns:

result – a tensor fo means

Return type:

torch.Tensor

speechbrain.utils.data_utils.dist_stats(sample, mask=None)[source]

Returns standard distribution statistics (mean, std, min, max)

Parameters:

samples (torch.Tensor) – a tensor of spectrograms
mask (torch.Tensor) – a length mask

Returns:

result – a tensor fo means

Return type:

torch.Tensor

speechbrain.utils.data_utils.dict_value_combinations(values)[source]

Returns all possible key-value combinations from the given dictionary

Parameters:

values (dict) –
A dictionary with lists of values as values Example: {

”digit”: [1,2,3], “speaker”: [10, 20]

}
keys (list) – the keys to consider

Returns:

result – a list of dictionaries in which each dictionary is a possible permitations

Return type:

list

speechbrain.utils.data_utils.dict_value_combinations_gen(values, keys)[source]

Returns a generation of permutations of the specified values dictionary

Parameters:

values (dict) –
A dictionary with lists of values as values Example: {

”digit”: [1,2,3], “speaker”: [10, 20]

}
keys (list) – the keys to consider

Returns:

result – a generator of dictionaries in which each dictionary is a possible permitations

Return type:

generator