speechbrain.utils.parallel module
Parallel processing tools to help speed up certain tasks like data preprocessing.
- Authors
Sylvain de Langen 2023
Summary
Classes:
Context manager that .cancel()s all elements of a list upon exit. |
Functions:
Maps iterable items with a function, processing chunks of items in parallel with multiple processes and displaying progress with tqdm. |
Reference
- class speechbrain.utils.parallel.CancelFuturesOnExit(future_list)[source]
Bases:
object
Context manager that .cancel()s all elements of a list upon exit. This is used to abort futures faster when raising an exception.
- speechbrain.utils.parallel.parallel_map(fn: Callable[[Any], Any], source: Iterable[Any], process_count: int = 2, chunk_size: int = 8, queue_size: int = 128, executor: Executor | None = None, progress_bar: bool = True, progress_bar_kwargs: dict = {'smoothing': 0.02})[source]
Maps iterable items with a function, processing chunks of items in parallel with multiple processes and displaying progress with tqdm.
Processed elements will always be returned in the original, correct order. Unlike
ProcessPoolExecutor.map
, elements are produced AND consumed lazily.- Parameters:
fn – The function that is called for every element in the source list. The output is an iterator over the source list after fn(elem) is called.
source (Iterable) – Iterator whose elements are passed through the mapping function.
process_count (int) – The number of processes to spawn. Ignored if a custom executor is provided. For CPU-bound tasks, it is generally not useful to exceed logical core count. For IO-bound tasks, it may make sense to as to limit the amount of time spent in iowait.
chunk_size (int) – How many elements are fed to the worker processes at once. A value of 8 is generally fine. Low values may increase overhead and reduce CPU occupancy.
queue_size (int) – Number of chunks to be waited for on the main process at a time. Low values increase the chance of the queue being starved, forcing workers to idle. Very high values may cause high memory usage, especially if the source iterable yields large objects.
executor (Optional[Executor]) – Allows providing an existing executor (preferably a ProcessPoolExecutor). If None (the default), a process pool will be spawned for this mapping task and will be shut down after.
progress_bar (bool) – Whether to show a tqdm progress bar.
progress_bar_kwargs (dict) – A dict of keyword arguments that is forwarded to tqdm when
progress_bar == True
. Allows overriding the defaults or e.g. specifyingtotal
when it cannot be inferred from the source iterable.