speechbrain.lobes.models.convolution moduleο
This is a module to ensemble a convolution (depthwise) encoder with or without residual connection.
- Authors
Jianyuan Zhong 2020
Titouan Parcollet 2023
Summaryο
Classes:
An implementation of convolution block with 1d or 2d convolutions (depthwise). |
|
This is a module to ensemble a convolution (depthwise) encoder with or without residual connection. |
|
This module implementing CSGU as defined in: Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding" |
Referenceο
- class speechbrain.lobes.models.convolution.ConvolutionalSpatialGatingUnit(input_size, kernel_size=31, dropout=0.0, use_linear_after_conv=False, activation=<class 'torch.nn.modules.linear.Identity'>)[source]ο
Bases:
Module
This module implementing CSGU as defined in: Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understandingβ
The code is heavily inspired from the original ESPNet implementation.
- Parameters:
input_size (int) β Size of the feature (channel) dimension.
kernel_size (int, optional) β Size of the kernel
dropout (float, optional) β Dropout rate to be applied at the output
use_linear_after_conv (bool, optional) β If True, will apply a linear transformation of size input_size//2
activation (torch.class, optional) β Activation function to use on the gate, default is Identity.
Example
>>> x = torch.rand((8, 30, 10)) >>> conv = ConvolutionalSpatialGatingUnit(input_size=x.shape[-1]) >>> out = conv(x) >>> out.shape torch.Size([8, 30, 5])
- class speechbrain.lobes.models.convolution.ConvolutionFrontEnd(input_shape, num_blocks=3, num_layers_per_block=5, out_channels=[128, 256, 512], kernel_sizes=[3, 3, 3], strides=[1, 2, 2], dilations=[1, 1, 1], residuals=[True, True, True], conv_module=<class 'speechbrain.nnet.CNN.Conv2d'>, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, norm=<class 'speechbrain.nnet.normalization.LayerNorm'>, dropout=0.1, conv_bias=True, padding='same', conv_init=None)[source]ο
Bases:
Sequential
This is a module to ensemble a convolution (depthwise) encoder with or without residual connection.
- Parameters:
input_shape (tuple) β Expected shape of the input tensor.
num_blocks (int) β Number of block (default 21).
num_layers_per_block (int) β Number of convolution layers for each block (default 5).
out_channels (Optional(list[int])) β Number of output channels for each of block.
kernel_sizes (Optional(list[int])) β Kernel size of convolution blocks.
strides (Optional(list[int])) β Striding factor for each block, this stride is applied at the last convolution layer at each block.
dilations (Optional(list[int])) β Dilation factor for each block.
residuals (Optional(list[bool])) β Whether apply residual connection at each block (default None).
conv_module (class) β Class to use for constructing conv layers.
activation (Callable) β Activation function for each block (default LeakyReLU).
norm (torch class) β Normalization to regularize the model (default BatchNorm1d).
dropout (float) β Dropout (default 0.1).
conv_bias (bool) β Whether to add a bias term to convolutional layers.
padding (str) β Type of padding to apply.
conv_init (str) β Type of initialization to use for conv layers.
Example
>>> x = torch.rand((8, 30, 10)) >>> conv = ConvolutionFrontEnd(input_shape=x.shape) >>> out = conv(x) >>> out.shape torch.Size([8, 8, 3, 512])
- get_filter_properties() FilterProperties [source]ο
- class speechbrain.lobes.models.convolution.ConvBlock(num_layers, out_channels, input_shape, kernel_size=3, stride=1, dilation=1, residual=False, conv_module=<class 'speechbrain.nnet.CNN.Conv2d'>, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, norm=None, dropout=0.1, conv_bias=True, padding='same', conv_init=None)[source]ο
Bases:
Module
An implementation of convolution block with 1d or 2d convolutions (depthwise).
- Parameters:
num_layers (int) β Number of depthwise convolution layers for this block.
out_channels (int) β Number of output channels of this model (default 640).
input_shape (tuple) β Expected shape of the input tensor.
kernel_size (int) β Kernel size of convolution layers (default 3).
stride (int) β Striding factor for this block (default 1).
dilation (int) β Dilation factor.
residual (bool) β Add a residual connection if True.
conv_module (torch class) β Class to use when constructing conv layers.
activation (Callable) β Activation function for this block.
norm (torch class) β Normalization to regularize the model (default BatchNorm1d).
dropout (float) β Rate to zero outputs at.
conv_bias (bool) β Add a bias term to conv layers.
padding (str) β The type of padding to add.
conv_init (str) β Type of initialization to use for conv layers.
Example
>>> x = torch.rand((8, 30, 10)) >>> conv = ConvBlock(2, 16, input_shape=x.shape) >>> out = conv(x) >>> x.shape torch.Size([8, 30, 10])
- get_filter_properties() FilterProperties [source]ο