speechbrain.utils.data_utils module

This library gathers utilities for data io operation.

Authors
  • Mirco Ravanelli 2020

  • Aku Rouhe 2020

  • Samuele Cornell 2020

  • Adel Moumen 2024

  • Pierre Champion 2023

Summary

Functions:

batch_pad_right

Given a list of torch tensors it batches them together by padding to the right on each dimension in order to get same length for all.

batch_shuffle

Shuffles batches of fixed size within a sequence

concat_padded_features

Concatenates multiple padded feature tensors into a single padded tensor in a vectorized manner without including the padding in the final tensor, adding padding only at the end.

dict_value_combinations

Returns all possible key-value combinations from the given dictionary

dict_value_combinations_gen

Returns a generation of permutations of the specified values dictionary

dist_stats

Returns standard distribution statistics (mean, std, min, max)

download_file

Downloads the file from the given source and saves it in the given destination path.

get_all_files

Returns a list of files found within a folder.

get_list_from_csv

Gets a list from the selected field of the input csv file.

length_range

Creates a tensor with a range in a single dimension to one matching the shape of a its tensor

masked_max

A metric function that computes the minimum of each sample

masked_mean

A metric function that computes the mean of each sample, excluding padding

masked_min

A metric function that computes the minimum of each sample

masked_std

A metric function that computes the standard deviation of each sample, excluding padding

match_shape

A swiss-army-knife helper function to match the shape of a tensor to match that of another tensor - useful for masks, etc.

mod_default_collate

Makes a tensor from list of batch values.

non_batch_dims

Returns all dimensons of the specified tensor except the batch dimension

pad_divisible

Adds extra padding to the specified dimension of a tensor to make it divisible by the specified factor.

pad_right_to

This function takes a torch tensor of arbitrary shape and pads it to target shape by appending values on the right.

recursive_items

Yield each (key, value) of a nested dictionary.

recursive_to

Moves data to device, or other type, and handles containers.

recursive_update

Similar function to dict.update, but for a nested dict.

scalarize

Converts a namedtuple or dictionary containing tensors to their scalar value Arguments: ---------- value: dict or namedtuple a dictionary or named tuple of tensors :returns: result -- a result dictionary :rtype: dict

set_writing_permissions

This function sets user writing permissions to all the files in the given folder.

split_by_whitespace

A very basic functional version of str.split

split_list

Returns a list of splits in the sequence.

split_path

Splits a path to source and filename

trim_as

Trims the specified tensor to match the shape of another tensor (at most)

trim_to_shape

Trims the specified tensor to match the specified shape

undo_padding

Produces Python lists given a batch of sentences with their corresponding relative lengths.

unsqueeze_1d

Unsqueezes a 1-D tensor to the specified number of dimension preserving one dimension and creating "dummy" dimensions elsewhere

unsqueeze_as

Reshape the tensor to be of a shape compatible with the target tensor, only valid if x.dim() <= y.dim()

Reference

speechbrain.utils.data_utils.undo_padding(batch, lengths)[source]

Produces Python lists given a batch of sentences with their corresponding relative lengths.

Parameters:
  • batch (tensor) – Batch of sentences gathered in a batch.

  • lengths (tensor) – Relative length of each sentence in the batch.

Example

>>> batch=torch.rand([4,100])
>>> lengths=torch.tensor([0.5,0.6,0.7,1.0])
>>> snt_list=undo_padding(batch, lengths)
>>> len(snt_list)
4
speechbrain.utils.data_utils.get_all_files(dirName, match_and=None, match_or=None, exclude_and=None, exclude_or=None)[source]

Returns a list of files found within a folder.

Different options can be used to restrict the search to some specific patterns.

Parameters:
  • dirName (str) – The directory to search.

  • match_and (list) – A list that contains patterns to match. The file is returned if it matches all the entries in match_and.

  • match_or (list) – A list that contains patterns to match. The file is returned if it matches one or more of the entries in match_or.

  • exclude_and (list) – A list that contains patterns to match. The file is returned if it matches none of the entries in exclude_and.

  • exclude_or (list) – A list that contains pattern to match. The file is returned if it fails to match one of the entries in exclude_or.

Example

>>> get_all_files('tests/samples/RIRs', match_and=['3.wav'])
['tests/samples/RIRs/rir3.wav']
speechbrain.utils.data_utils.get_list_from_csv(csvfile, field, delimiter=',', skipinitialspace=True)[source]

Gets a list from the selected field of the input csv file.

Parameters:
  • csv_file (path) – Path to the csv file.

  • field (str) – Field of the csv file used to create the list.

  • delimiter (str) – Delimiter of the csv file.

  • skipinitialspace (bool) – Set it to true to skip initial spaces in the entries.

speechbrain.utils.data_utils.split_list(seq, num)[source]

Returns a list of splits in the sequence.

Parameters:
  • seq (iterable) – The input list, to be split.

  • num (int) – The number of chunks to produce.

Example

>>> split_list([1, 2, 3, 4, 5, 6, 7, 8, 9], 4)
[[1, 2], [3, 4], [5, 6], [7, 8, 9]]
speechbrain.utils.data_utils.recursive_items(dictionary)[source]

Yield each (key, value) of a nested dictionary.

Parameters:

dictionary (dict) – The nested dictionary to list.

Yields:

(key, value) tuples from the dictionary.

Example

>>> rec_dict={'lev1': {'lev2': {'lev3': 'current_val'}}}
>>> [item for item in recursive_items(rec_dict)]
[('lev3', 'current_val')]
speechbrain.utils.data_utils.recursive_update(d, u, must_match=False)[source]

Similar function to dict.update, but for a nested dict.

From: https://stackoverflow.com/a/3233356

If you have to a nested mapping structure, for example:

{“a”: 1, “b”: {“c”: 2}}

Say you want to update the above structure with:

{“b”: {“d”: 3}}

This function will produce:

{“a”: 1, “b”: {“c”: 2, “d”: 3}}

Instead of:

{“a”: 1, “b”: {“d”: 3}}

Parameters:
  • d (dict) – Mapping to be updated.

  • u (dict) – Mapping to update with.

  • must_match (bool) – Whether to throw an error if the key in u does not exist in d.

Example

>>> d = {'a': 1, 'b': {'c': 2}}
>>> recursive_update(d, {'b': {'d': 3}})
>>> d
{'a': 1, 'b': {'c': 2, 'd': 3}}
speechbrain.utils.data_utils.download_file(source, dest, unpack=False, dest_unpack=None, replace_existing=False, write_permissions=False)[source]

Downloads the file from the given source and saves it in the given destination path.

Arguments

the web.

destpath

Destination path.

unpackbool

If True, it unpacks the data in the dest folder.

dest_unpack: path

Path where to store the unpacked dataset

replace_existingbool

If True, replaces the existing files.

write_permissions: bool

When set to True, all the files in the dest_unpack directory will be granted write permissions. This option is active only when unpack=True.

speechbrain.utils.data_utils.set_writing_permissions(folder_path)[source]

This function sets user writing permissions to all the files in the given folder.

Parameters:

folder_path (folder) – Folder whose files will be granted write permissions.

speechbrain.utils.data_utils.pad_right_to(tensor: Tensor, target_shape, mode='constant', value=0)[source]

This function takes a torch tensor of arbitrary shape and pads it to target shape by appending values on the right.

Parameters:
  • tensor (input torch tensor) – Input tensor whose dimension we need to pad.

  • target_shape ((list, tuple)) – Target shape we want for the target tensor its len must be equal to tensor.ndim

  • mode (str) – Pad mode, please refer to torch.nn.functional.pad documentation.

  • value (float) – Pad value, please refer to torch.nn.functional.pad documentation.

Returns:

  • tensor (torch.Tensor) – Padded tensor.

  • valid_vals (list) – List containing proportion for each dimension of original, non-padded values.

speechbrain.utils.data_utils.batch_pad_right(tensors: list, mode='constant', value=0)[source]

Given a list of torch tensors it batches them together by padding to the right on each dimension in order to get same length for all.

Parameters:
  • tensors (list) – List of tensor we wish to pad together.

  • mode (str) – Padding mode see torch.nn.functional.pad documentation.

  • value (float) – Padding value see torch.nn.functional.pad documentation.

Returns:

  • tensor (torch.Tensor) – Padded tensor.

  • valid_vals (list) – List containing proportion for each dimension of original, non-padded values.

speechbrain.utils.data_utils.split_by_whitespace(text)[source]

A very basic functional version of str.split

speechbrain.utils.data_utils.recursive_to(data, *args, **kwargs)[source]

Moves data to device, or other type, and handles containers.

Very similar to torch.utils.data._utils.pin_memory.pin_memory, but applies .to() instead.

speechbrain.utils.data_utils.mod_default_collate(batch)[source]

Makes a tensor from list of batch values.

Note that this doesn’t need to zip(*) values together as PaddedBatch connects them already (by key).

Here the idea is not to error out.

This is modified from: https://github.com/pytorch/pytorch/blob/c0deb231db76dbea8a9d326401417f7d1ce96ed5/torch/utils/data/_utils/collate.py#L42

speechbrain.utils.data_utils.split_path(path)[source]

Splits a path to source and filename

This also handles URLs and Huggingface hub paths, in addition to regular paths.

Parameters:

path (str or FetchSource) –

Returns:

  • str – Source

  • str – Filename

speechbrain.utils.data_utils.scalarize(value)[source]

Converts a namedtuple or dictionary containing tensors to their scalar value Arguments: ———- value: dict or namedtuple

a dictionary or named tuple of tensors

Returns:

result – a result dictionary

Return type:

dict

speechbrain.utils.data_utils.unsqueeze_as(x, target)[source]

Reshape the tensor to be of a shape compatible with the target tensor, only valid if x.dim() <= y.dim()

Parameters:
Returns:

result – a view of tensor x reshaped to a shape compatible with y

Return type:

torch.Tensor

speechbrain.utils.data_utils.pad_divisible(tensor, length=None, factor=2, len_dim=1, pad_value=0)[source]

Adds extra padding to the specified dimension of a tensor to make it divisible by the specified factor. This is useful when passing variable-length sequences to downsampling UNets or other similar architectures in which inputs are expected to be divisible by the downsampling factor

Parameters:
  • tensor (torch.Tensor) – the tensor to be padded, of arbitrary dimension

  • length (torch.Tensor) – a 1-D tensor of relative lengths

  • factor (int) – the divisibility factor

  • len_dim (int) – the index of the dimension used as the length

  • pad_value (int) – the value with which outputs will be padded

Returns:

  • tensor_padded (torch.Tensor) – the tensor, with additional padding if required

  • length (torch.Tensor) – the adjsted length tensor, if provided

Example

>>> x = torch.tensor([[1, 2, 3, 4],
...                   [5, 6, 0, 0]])
>>> lens = torch.tensor([1., .5])
>>> x_pad, lens_pad = pad_divisible(x, length=lens, factor=5)
>>> x_pad
tensor([[1, 2, 3, 4, 0],
        [5, 6, 0, 0, 0]])
>>> lens_pad
tensor([0.8000, 0.4000])
speechbrain.utils.data_utils.trim_to_shape(tensor, shape)[source]

Trims the specified tensor to match the specified shape

Parameters:
  • tensor (torch.Tensor) – a tensor

  • shape (enumerable) – the desired shape

Returns:

tensor – the trimmed tensor

Return type:

torch.Tensor

speechbrain.utils.data_utils.trim_as(tensor, other)[source]

Trims the specified tensor to match the shape of another tensor (at most)

Parameters:
  • tensor (torch.Tensor:) – a tensor

  • other (torch.Tensor) – the tensor whose shape to match

Returns:

tensor – the trimmed tensor

Return type:

torch.Tensor

speechbrain.utils.data_utils.match_shape(tensor, other)[source]

A swiss-army-knife helper function to match the shape of a tensor to match that of another tensor - useful for masks, etc.

Parameters:
  • tensor (torch.Tensor:) – a tensor

  • other (torch.Tensor) – the tensor whose shape to match

Returns:

tensor – the tensor with matching shape

Return type:

torch.Tensor

speechbrain.utils.data_utils.batch_shuffle(items, batch_size)[source]

Shuffles batches of fixed size within a sequence

Parameters:
  • items (sequence) – a tensor or an indexable sequence, such as a list

  • batch_size (int) – the batch size

Returns:

items – the original items. If a tensor was passed, a tensor will be returned. Otherwise, it will return a list

Return type:

sequence

speechbrain.utils.data_utils.concat_padded_features(feats, lens, dim=1, feats_slice_start=None, feats_slice_end=None)[source]

Concatenates multiple padded feature tensors into a single padded tensor in a vectorized manner without including the padding in the final tensor, adding padding only at the end. The function supports optional relative sicing of the tensors.

One possible use case is to concatenate batches of spectrograms or audio.

Parameters:
  • feats (list) – a list of padded tesnors

  • lens (list) – a list of length tensors

  • feats_slice_start (list) – offsets, relative to the beginning of the sequence, for each of the tensors being concatenated. This is useful if only a subsequence of some slices is included

  • feats_slice_end (list) – offsets, relative to the end of the sequence, for each of the tensors being concatenated. This is useful if only a subsequence of some slices is included

Returns:

out – a concatenated tensor

Return type:

torch.Tensor

speechbrain.utils.data_utils.unsqueeze_1d(value, dim, value_dim)[source]

Unsqueezes a 1-D tensor to the specified number of dimension preserving one dimension and creating “dummy” dimensions elsewhere

Parameters:
  • value (torch.Tensor) – A 1-D tensor

  • dim (int) – the number of dimension

  • value_dim (int) – the dimension that the value tensor represents

Returns:

result – a dim-dimensional tensor

Return type:

torch.Tensor

speechbrain.utils.data_utils.length_range(feats, len_dim)[source]

Creates a tensor with a range in a single dimension to one matching the shape of a its tensor

Parameters:
Returns:

result – a tensor matching the shape of feats with an 0 to max-length range along the length dimension repeated across other dimensions

Return type:

torch.Tensor

speechbrain.utils.data_utils.non_batch_dims(sample)[source]

Returns all dimensons of the specified tensor except the batch dimension

Parameters:

sample (torch.Tensor) – an arbitrary tensor

Returns:

dims – a list of dimensions

Return type:

list

speechbrain.utils.data_utils.masked_mean(sample, mask=None)[source]

A metric function that computes the mean of each sample, excluding padding

Parameters:
Returns:

result – a tensor fo means

Return type:

torch.Tensor

speechbrain.utils.data_utils.masked_std(sample, mask=None)[source]

A metric function that computes the standard deviation of each sample, excluding padding

Parameters:
Returns:

result – a tensor fo means

Return type:

torch.Tensor

speechbrain.utils.data_utils.masked_min(sample, mask=None)[source]

A metric function that computes the minimum of each sample

Parameters:
Returns:

result – a tensor fo means

Return type:

torch.Tensor

speechbrain.utils.data_utils.masked_max(sample, mask=None)[source]

A metric function that computes the minimum of each sample

Parameters:
Returns:

result – a tensor fo means

Return type:

torch.Tensor

speechbrain.utils.data_utils.dist_stats(sample, mask=None)[source]

Returns standard distribution statistics (mean, std, min, max)

Parameters:
Returns:

result – a tensor fo means

Return type:

torch.Tensor

speechbrain.utils.data_utils.dict_value_combinations(values)[source]

Returns all possible key-value combinations from the given dictionary

Parameters:
  • values (dict) –

    A dictionary with lists of values as values Example: {

    ”digit”: [1,2,3], “speaker”: [10, 20]

    }

  • keys (list) – the keys to consider

Returns:

result – a list of dictionaries in which each dictionary is a possible permitations

Return type:

list

speechbrain.utils.data_utils.dict_value_combinations_gen(values, keys)[source]

Returns a generation of permutations of the specified values dictionary

Parameters:
  • values (dict) –

    A dictionary with lists of values as values Example: {

    ”digit”: [1,2,3], “speaker”: [10, 20]

    }

  • keys (list) – the keys to consider

Returns:

result – a generator of dictionaries in which each dictionary is a possible permitations

Return type:

generator