speechbrain.dataio.legacy moduleο
SpeechBrain Extended CSV Compatibility.
Summaryο
Classes:
The Legacy Extended CSV Data item triplet |
|
Extended CSV compatibility for DynamicItemDataset. |
Functions:
Loads SB Extended CSV and formats string values. |
|
This function reads tensors store in pkl format. |
Referenceο
- class speechbrain.dataio.legacy.CSVItem(data, format, opts)ο
Bases:
tuple
The Legacy Extended CSV Data item triplet
- dataο
Alias for field number 0
- formatο
Alias for field number 1
- optsο
Alias for field number 2
- class speechbrain.dataio.legacy.ExtendedCSVDataset(csvpath, replacements={}, sorting='original', min_duration=0, max_duration=36000, dynamic_items=[], output_keys=[])[source]ο
Bases:
DynamicItemDataset
Extended CSV compatibility for DynamicItemDataset.
Uses the SpeechBrain Extended CSV data format, where the CSV must have an βIDβ and βdurationβ fields.
The rest of the fields come in triplets:
<name>, <name>_format, <name>_opts
These add a <name>_sb_data item in the dict. Additionally, a basic DynamicItem (see DynamicItemDataset) is created, which loads the _sb_data item.
Bash-like string replacements with $to_replace are supported.
Note
Mapping from legacy interface:
csv_file -> csvpath
sentence_sorting -> sorting, and βrandomβ is not supported, use e.g.
make_dataloader(..., shuffle = (sorting=="random"))
avoid_if_shorter_than -> min_duration
avoid_if_longer_than -> max_duration
csv_read -> output_keys, and if you want IDs add βidβ as key
- Parameters:
csvpath (str, path) β Path to extended CSV.
replacements (dict) β Used for Bash-like $-prefixed substitution, e.g.
{"data_folder": "/home/speechbrain/data"}
, which would transform$data_folder/utt1.wav
into/home/speechbrain/data/utt1.wav
sorting ({"original", "ascending", "descending"}) β Keep CSV order, or sort ascending or descending by duration.
min_duration (float, int) β Minimum duration in seconds. Discards other entries.
max_duration (float, int) β Maximum duration in seconds. Discards other entries.
dynamic_items (list) β
Configuration for extra dynamic items produced when fetching an example. List of DynamicItems or dicts with keys:
func: <callable> # To be called takes: <list> # key or list of keys of args this takes provides: key # key or list of keys that this provides
NOTE: A dynamic item is automatically added for each CSV data-triplet
output_keys (list, None) β The list of output keys to produce. You can refer to the names of the CSV data-triplets. E.G. if the CSV has: wav,wav_format,wav_opts, then the Dataset has a dynamic item output available with key
"wav"
NOTE: If None, read all existing.
- speechbrain.dataio.legacy.load_sb_extended_csv(csv_path, replacements={})[source]ο
Loads SB Extended CSV and formats string values.
Uses the SpeechBrain Extended CSV data format, where the CSV must have an βIDβ and βdurationβ fields.
The rest of the fields come in triplets:
<name>, <name>_format, <name>_opts
.These add a <name>_sb_data item in the dict. Additionally, a basic DynamicItem (see DynamicItemDataset) is created, which loads the _sb_data item.
Bash-like string replacements with $to_replace are supported.
This format has its restriction, but they allow some tasks to have loading specified by the CSV.
- Parameters:
- Returns:
dict β CSV data with replacements applied.
list β List of DynamicItems to add in DynamicItemDataset.