speechbrain.lobes.models.transformer.conformer module¶

Conformer implementation in the SpeechBrain sytle.

Authors * Jianyuan Zhong 2020

Summary¶

Classes:

`ConformerEncoder`	This class implements the Conformer encoder.
`ConformerEncoderLayer`	This is an implementation of Conformer encoder layer.
`ConvolutionModule`	This is an implementation of convolution module in Conformer.

Reference¶

class speechbrain.lobes.models.transformer.conformer.ConvolutionModule(input_size, kernel_size, bias=True, activation=<class 'speechbrain.nnet.activations.Swish'>, dropout=0.1)[source]¶

Bases: torch.nn.modules.module.Module

This is an implementation of convolution module in Conformer.

Parameters

input_size (int) – The expected size of the input embedding.
dropout (int) – Dropout for the encoder (Optional).
bias (bool) – Bias to convolution module.
kernel_size (int) – Kernel size of convolution model.

Example

>>> import torch
>>> x = torch.rand((8, 60, 512))
>>> net = ConvolutionModule(512, 3)
>>> output = net(x)
>>> output.shape
torch.Size([8, 60, 512])

forward(x)[source]¶

training: bool¶

class speechbrain.lobes.models.transformer.conformer.ConformerEncoderLayer(d_model, d_ffn, nhead, kernel_size, kdim=None, vdim=None, activation=<class 'speechbrain.nnet.activations.Swish'>, bias=True, dropout=0.1)[source]¶

Bases: torch.nn.modules.module.Module

This is an implementation of Conformer encoder layer.

Parameters

d_ffn (int) – Hidden size of self-attention Feed Forward layer.
nhead (int) – Number of attention heads.
d_model (int) – The expected size of the input embedding.
reshape (bool) – Whether to automatically shape 4-d input to 3-d.
kdim (int) – Dimension of the key (Optional).
vdim (int) – Dimension of the value (Optional).
dropout (int) – Dropout for the encoder (Optional).
bias (bool) – Bias to convolution module.
kernel_size (int) – Kernel size of convolution model.

Example

>>> import torch
>>> x = torch.rand((8, 60, 512))
>>> net = ConformerEncoderLayer(d_ffn=512, nhead=8, d_model=512, kernel_size=3)
>>> output = net(x)
>>> output[0].shape
torch.Size([8, 60, 512])

forward(x, src_mask: Optional[torch.Tensor] = None, src_key_padding_mask: Optional[torch.Tensor] = None)[source]¶

training: bool¶

class speechbrain.lobes.models.transformer.conformer.ConformerEncoder(num_layers, nhead, d_ffn, input_shape=None, d_model=None, kdim=None, vdim=None, dropout=0.1, activation=<class 'speechbrain.nnet.activations.Swish'>, kernel_size=31, bias=True)[source]¶

Bases: torch.nn.modules.module.Module

This class implements the Conformer encoder.

Parameters

num_layers (int) – Number of Conformer layers to include.
nhead (int) – Number of attention heads.
d_ffn (int) – Hidden size of self-attention Feed Forward layer.
input_shape (tuple) – Expected shape of an example input.
d_model (int) – The dimension of the input embedding.
kdim (int) – Dimension for key (Optional).
vdim (int) – Dimension for value (Optional).
dropout (float) – Dropout for the encoder (Optional).
input_module (torch class) – The module to process the source input feature to expected feature dimension (Optional).

Example

>>> import torch
>>> x = torch.rand((8, 60, 512))
>>> net = ConformerEncoder(1, 8, 512, d_model=512)
>>> output, _ = net(x)
>>> output.shape
torch.Size([8, 60, 512])

forward(src, src_mask: Optional[torch.Tensor] = None, src_key_padding_mask: Optional[torch.Tensor] = None)[source]¶

Parameters

src (tensor) – The sequence to the encoder layer (required).
src_mask (tensor) – The mask for the src sequence (optional).
src_key_padding_mask (tensor) – The mask for the src keys per batch (optional).

training: bool¶