speechbrain.dataio.legacy module¶
SpeechBrain Extended CSV Compatibility.
Summary¶
Classes:
The Legacy Extended CSV Data item triplet |
|
Extended CSV compatibility for DynamicItemDataset. |
Functions:
Loads SB Extended CSV and formats string values. |
|
This function reads tensors store in pkl format. |
Reference¶
- class speechbrain.dataio.legacy.CSVItem(data, format, opts)¶
Bases:
tuple
The Legacy Extended CSV Data item triplet
- data¶
Alias for field number 0
- format¶
Alias for field number 1
- opts¶
Alias for field number 2
- class speechbrain.dataio.legacy.ExtendedCSVDataset(csvpath, replacements={}, sorting='original', min_duration=0, max_duration=36000, dynamic_items=[], output_keys=[])[source]¶
Bases:
Generic
[torch.utils.data.dataset.T_co
]Extended CSV compatibility for DynamicItemDataset.
Uses the SpeechBrain Extended CSV data format, where the CSV must have an ‘ID’ and ‘duration’ fields.
The rest of the fields come in triplets:
<name>, <name>_format, <name>_opts
These add a <name>_sb_data item in the dict. Additionally, a basic DynamicItem (see DynamicItemDataset) is created, which loads the _sb_data item.
Bash-like string replacements with $to_replace are supported.
Note
Mapping from legacy interface:
csv_file -> csvpath
sentence_sorting -> sorting, and “random” is not supported, use e.g.
make_dataloader(..., shuffle = (sorting=="random"))
avoid_if_shorter_than -> min_duration
avoid_if_longer_than -> max_duration
csv_read -> output_keys, and if you want IDs add “id” as key
- Parameters
csvpath (str, path) – Path to extended CSV.
replacements (dict) – Used for Bash-like $-prefixed substitution, e.g.
{"data_folder": "/home/speechbrain/data"}
, which would transform $data_folder/utt1.wav into /home/speechbain/data/utt1.wavsorting ({"original", "ascending", "descending"}) – Keep CSV order, or sort ascending or descending by duration.
min_duration (float, int) – Minimum duration in seconds. Discards other entries.
max_duration (float, int) – Maximum duration in seconds. Discards other entries.
dynamic_items (list) –
Configuration for extra dynamic items produced when fetching an example. List of DynamicItems or dicts with keys:
func: <callable> # To be called takes: <list> # key or list of keys of args this takes provides: key # key or list of keys that this provides
NOTE: A dynamic item is automatically added for each CSV data-triplet
output_keys (list, None) – The list of output keys to produce. You can refer to the names of the CSV data-triplets. E.G. if the CSV has: wav,wav_format,wav_opts, then the Dataset has a dynamic item output available with key
"wav"
NOTE: If None, read all existing.
- speechbrain.dataio.legacy.load_sb_extended_csv(csv_path, replacements={})[source]¶
Loads SB Extended CSV and formats string values.
Uses the SpeechBrain Extended CSV data format, where the CSV must have an ‘ID’ and ‘duration’ fields.
The rest of the fields come in triplets:
<name>, <name>_format, <name>_opts
.These add a <name>_sb_data item in the dict. Additionally, a basic DynamicItem (see DynamicItemDataset) is created, which loads the _sb_data item.
Bash-like string replacements with $to_replace are supported.
This format has its restriction, but they allow some tasks to have loading specified by the CSV.
- Parameters
- Returns
dict – CSV data with replacements applied.
list – List of DynamicItems to add in DynamicItemDataset.