speechbrain.lobes.models.transformer.conformer module¶
Conformer implementation in the SpeechBrain sytle.
Authors * Jianyuan Zhong 2020
Summary¶
Classes:
This class implements the Conformer encoder. |
|
This is an implementation of Conformer encoder layer. |
|
This is an implementation of convolution module in Conformer. |
Reference¶
- class speechbrain.lobes.models.transformer.conformer.ConvolutionModule(input_size, kernel_size, bias=True, activation=<class 'speechbrain.nnet.activations.Swish'>, dropout=0.1)[source]¶
Bases:
torch.nn.modules.module.Module
This is an implementation of convolution module in Conformer.
- Parameters
Example
>>> import torch >>> x = torch.rand((8, 60, 512)) >>> net = ConvolutionModule(512, 3) >>> output = net(x) >>> output.shape torch.Size([8, 60, 512])
- class speechbrain.lobes.models.transformer.conformer.ConformerEncoderLayer(d_model, d_ffn, nhead, kernel_size, kdim=None, vdim=None, activation=<class 'speechbrain.nnet.activations.Swish'>, bias=True, dropout=0.1)[source]¶
Bases:
torch.nn.modules.module.Module
This is an implementation of Conformer encoder layer.
- Parameters
d_ffn (int) – Hidden size of self-attention Feed Forward layer.
nhead (int) – Number of attention heads.
d_model (int) – The expected size of the input embedding.
reshape (bool) – Whether to automatically shape 4-d input to 3-d.
kdim (int) – Dimension of the key (Optional).
vdim (int) – Dimension of the value (Optional).
dropout (int) – Dropout for the encoder (Optional).
bias (bool) – Bias to convolution module.
kernel_size (int) – Kernel size of convolution model.
Example
>>> import torch >>> x = torch.rand((8, 60, 512)) >>> net = ConformerEncoderLayer(d_ffn=512, nhead=8, d_model=512, kernel_size=3) >>> output = net(x) >>> output[0].shape torch.Size([8, 60, 512])
- forward(x, src_mask: Optional[torch.Tensor] = None, src_key_padding_mask: Optional[torch.Tensor] = None)[source]¶
- class speechbrain.lobes.models.transformer.conformer.ConformerEncoder(num_layers, nhead, d_ffn, input_shape=None, d_model=None, kdim=None, vdim=None, dropout=0.1, activation=<class 'speechbrain.nnet.activations.Swish'>, kernel_size=31, bias=True)[source]¶
Bases:
torch.nn.modules.module.Module
This class implements the Conformer encoder.
- Parameters
num_layers (int) – Number of Conformer layers to include.
nhead (int) – Number of attention heads.
d_ffn (int) – Hidden size of self-attention Feed Forward layer.
input_shape (tuple) – Expected shape of an example input.
d_model (int) – The dimension of the input embedding.
kdim (int) – Dimension for key (Optional).
vdim (int) – Dimension for value (Optional).
dropout (float) – Dropout for the encoder (Optional).
input_module (torch class) – The module to process the source input feature to expected feature dimension (Optional).
Example
>>> import torch >>> x = torch.rand((8, 60, 512)) >>> net = ConformerEncoder(1, 8, 512, d_model=512) >>> output, _ = net(x) >>> output.shape torch.Size([8, 60, 512])
- forward(src, src_mask: Optional[torch.Tensor] = None, src_key_padding_mask: Optional[torch.Tensor] = None)[source]¶
- Parameters
src (tensor) – The sequence to the encoder layer (required).
src_mask (tensor) – The mask for the src sequence (optional).
src_key_padding_mask (tensor) – The mask for the src keys per batch (optional).