speechbrain.utils.dynamic_chunk_training module

Configuration and utility classes for classes for Dynamic Chunk Training, as often used for the training of streaming-capable models in speech recognition.

The definition of Dynamic Chunk Training is based on that of the following paper, though a lot of the literature refers to the same definition: https://arxiv.org/abs/2012.05481

Authors * Sylvain de Langen 2023

Summary

Classes:

DynChunkTrainConfig

Dynamic Chunk Training configuration object for use with transformers, often in ASR for streaming.

DynChunkTrainConfigRandomSampler

Helper class to generate a DynChunkTrainConfig at runtime depending on the current stage.

Reference

class speechbrain.utils.dynamic_chunk_training.DynChunkTrainConfig(chunk_size: int, left_context_size: int | None = None)[source]

Bases: object

Dynamic Chunk Training configuration object for use with transformers, often in ASR for streaming.

This object may be used both to configure masking at training time and for run-time configuration of DynChunkTrain-ready models.

chunk_size: int

Size in frames of a single chunk, always >0. If chunkwise streaming should be disabled at some point, pass an optional streaming config parameter.

left_context_size: int | None = None

Number of chunks (not frames) visible to the left, always >=0. If zero, then chunks can never attend to any past chunk. If None, the left context is infinite (but use .is_fininite_left_context for such a check).

is_infinite_left_context() bool[source]

Returns true if the left context is infinite (i.e. any chunk can attend to any past frame).

left_context_size_frames() int | None[source]

Returns the number of left context frames (not chunks). If None, the left context is infinite. See also the left_context_size field.

class speechbrain.utils.dynamic_chunk_training.DynChunkTrainConfigRandomSampler(chunkwise_prob: float, chunk_size_min: int, chunk_size_max: int, limited_left_context_prob: float, left_context_chunks_min: int, left_context_chunks_max: int, test_config: DynChunkTrainConfig | None = None, valid_config: DynChunkTrainConfig | None = None)[source]

Bases: object

Helper class to generate a DynChunkTrainConfig at runtime depending on the current stage.

Example

>>> from speechbrain.core import Stage
>>> from speechbrain.utils.dynamic_chunk_training import DynChunkTrainConfig
>>> from speechbrain.utils.dynamic_chunk_training import DynChunkTrainConfigRandomSampler
>>> # for the purpose of this example, we test a scenario with a 100%
>>> # chance of the (24, None) scenario to occur
>>> sampler = DynChunkTrainConfigRandomSampler(
...     chunkwise_prob=1.0,
...     chunk_size_min=24,
...     chunk_size_max=24,
...     limited_left_context_prob=0.0,
...     left_context_chunks_min=16,
...     left_context_chunks_max=16,
...     test_config=DynChunkTrainConfig(32, 16),
...     valid_config=None
... )
>>> one_train_config = sampler(Stage.TRAIN)
>>> one_train_config
DynChunkTrainConfig(chunk_size=24, left_context_size=None)
>>> one_train_config.is_infinite_left_context()
True
>>> sampler(Stage.TEST)
DynChunkTrainConfig(chunk_size=32, left_context_size=16)
chunkwise_prob: float

When sampling (during Stage.TRAIN), the probability that a finite chunk size will be used. In the other case, any chunk can attend to the full past and future context.

chunk_size_min: int

When sampling a random chunk size, the minimum chunk size that can be picked.

chunk_size_max: int

When sampling a random chunk size, the maximum chunk size that can be picked.

limited_left_context_prob: float

When sampling a random chunk size, the probability that the left context will be limited. In the other case, any chunk can attend to the full past context.

left_context_chunks_min: int

When sampling a random left context size, the minimum number of left context chunks that can be picked.

left_context_chunks_max: int

When sampling a random left context size, the maximum number of left context chunks that can be picked.

test_config: DynChunkTrainConfig | None = None

The configuration that should be used for Stage.TEST. When None, evaluation is done with full context (i.e. non-streaming).

valid_config: DynChunkTrainConfig | None = None

The configuration that should be used for Stage.VALID. When None, evaluation is done with full context (i.e. non-streaming).

__call__(stage: Stage) DynChunkTrainConfig[source]

In training stage, samples a random DynChunkTrain configuration. During validation or testing, returns the relevant configuration.

Parameters:

stage (speechbrain.core.Stage) – Current stage of training or evaluation. In training mode, a random DynChunkTrainConfig will be sampled according to the specified probabilities and ranges. During evaluation, the relevant DynChunkTrainConfig attribute will be picked.