speechbrain.lobes.models.convolution moduleο
This is a module to ensemble a convolution (depthwise) encoder with or without residual connection.
- Authors
Jianyuan Zhong 2020
Titouan Parcollet 2023
Gianfranco Dumoulin Bertucci 2025
Summaryο
Classes:
An implementation of convolution block with 1d or 2d convolutions (depthwise). |
|
This is a module to ensemble a convolution (depthwise) encoder with or without residual connection. |
|
This module implementing CSGU as defined in: Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding" |
Referenceο
- class speechbrain.lobes.models.convolution.ConvolutionalSpatialGatingUnit(input_size: int, kernel_size: int = 31, dropout: float = 0.0, use_linear_after_conv: bool = False, activation: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.linear.Identity'>)[source]ο
Bases:
ModuleThis module implementing CSGU as defined in: Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understandingβ
The code is heavily inspired from the original ESPNet implementation.
- Parameters:
input_size (int) β Size of the feature (channel) dimension.
kernel_size (int, optional (default=31)) β Size of the kernel.
dropout (float, optional (default=0.0)) β Dropout rate to be applied at the output.
use_linear_after_conv (bool, optional (default=False)) β If True, will apply a linear transformation of size input_size//2.
activation (Type[torch.nn.Module], optional (default=torch.nn.Identity)) β Activation function to use on the gate.
Example
>>> x = torch.rand((8, 30, 10)) >>> conv = ConvolutionalSpatialGatingUnit(input_size=x.shape[-1]) >>> out = conv(x) >>> out.shape torch.Size([8, 30, 5])
- forward(x)[source]ο
- Parameters:
x (torch.Tensor) β Input tensor, shape (B, T, D)
- Returns:
out β The processed outputs.
- Return type:
- class speechbrain.lobes.models.convolution.ConvolutionFrontEnd(input_shape: ~typing.Iterable, num_blocks: int = 3, num_layers_per_block: int = 5, out_channels: ~typing.List[int] = [128, 256, 512], kernel_sizes: ~typing.List[int] = [3, 3, 3], strides: ~typing.List[int] = [1, 2, 2], dilations: ~typing.List[int] = [1, 1, 1], residuals: ~typing.List[bool] = [True, True, True], conv_module: ~typing.Type[~torch.nn.modules.module.Module] = <class 'speechbrain.nnet.CNN.Conv2d'>, activation: ~typing.Callable = <class 'torch.nn.modules.activation.LeakyReLU'>, norm: ~typing.Type[~torch.nn.modules.module.Module] | None = <class 'speechbrain.nnet.normalization.LayerNorm'>, dropout: float = 0.1, conv_bias: bool = True, padding: ~typing.Literal['same', 'valid', 'causal'] = 'same', conv_init: str | None = None)[source]ο
Bases:
SequentialThis is a module to ensemble a convolution (depthwise) encoder with or without residual connection.
- Parameters:
input_shape (Iterable) β Expected shape of the input tensor.
num_blocks (int, optional (default=3)) β Number of blocks.
num_layers_per_block (int, optional (default=5)) β Number of convolution layers for each block.
out_channels (List[int], optional (default=[128, 256, 512])) β Number of output channels for each block.
kernel_sizes (List[int], optional (default=[3, 3, 3])) β Kernel size of convolution blocks.
strides (List[int], optional (default=[1, 2, 2])) β Striding factor for each block, applied at the last layer.
dilations (List[int], optional (default=[1, 1, 1])) β Dilation factor for each block.
residuals (List[bool], optional (default=[True, True, True])) β Whether to apply residual connection at each block.
conv_module (Type[torch.nn.Module], optional (default=sb.nnet.Conv2d)) β Class to use for constructing conv layers.
activation (Callable, optional (default=torch.nn.LeakyReLU)) β Activation function for each block.
norm (Optional[Type[torch.nn.Module]] (default=LayerNorm)) β Normalization to regularize the model.
dropout (float, optional (default=0.1)) β Dropout probability.
conv_bias (bool, optional (default=True)) β Whether to add a bias term to convolutional layers.
padding (Literal["same", "valid", "causal"], optional (default="same")) β Type of padding to apply.
conv_init (Optional[str], optional (default=None=zeros)) β Type of initialization to use for conv layers.
Example
>>> x = torch.rand((8, 30, 10)) >>> conv = ConvolutionFrontEnd(input_shape=x.shape) >>> out = conv(x) >>> out.shape torch.Size([8, 8, 3, 512])
- get_filter_properties() FilterProperties[source]ο
- class speechbrain.lobes.models.convolution.ConvBlock(num_layers: int, out_channels: int, input_shape: ~typing.Iterable, kernel_size: int = 3, stride: int = 1, dilation: int = 1, residual: bool = False, conv_module: ~typing.Type[~torch.nn.modules.module.Module] = <class 'speechbrain.nnet.CNN.Conv2d'>, activation: ~typing.Callable = <class 'torch.nn.modules.activation.LeakyReLU'>, norm: ~typing.Type[~torch.nn.modules.module.Module] | None = None, dropout: float = 0.1, conv_bias: bool = True, padding: ~typing.Literal['same', 'valid', 'causal'] = 'same', conv_init: str | None = None)[source]ο
Bases:
ModuleAn implementation of convolution block with 1d or 2d convolutions (depthwise).
- Parameters:
num_layers (int) β Number of depthwise convolution layers for this block.
out_channels (int) β Number of output channels of this model.
input_shape (Iterable) β Expected shape of the input tensor.
kernel_size (int, optional (default=3)) β Kernel size of convolution layers.
stride (int, optional (default=1)) β Striding factor for this block.
dilation (int, optional (default=1)) β Dilation factor.
residual (bool, optional (default=False)) β Add a residual connection if True.
conv_module (Type[torch.nn.Module], optional (default=sb.nnet.Conv2d)) β Class to use when constructing conv layers.
activation (Callable, optional (default=torch.nn.LeakyReLU)) β Activation function for this block.
norm (Optional[Type[torch.nn.Module]] (default=None)) β Normalization to regularize the model.
dropout (float, optional (default=0.1)) β Rate to zero outputs at.
conv_bias (bool, optional (default=True)) β Add a bias term to conv layers.
padding (Literal["same", "valid", "causal"], optional (default="same")) β The type of padding to add.
conv_init (Optional[str], optional (default=None=zeros)) β Type of initialization to use for conv layers.
Example
>>> x = torch.rand((8, 30, 10)) >>> conv = ConvBlock(2, 16, input_shape=x.shape) >>> out = conv(x) >>> x.shape torch.Size([8, 30, 10])
- get_filter_properties() FilterProperties[source]ο