speechbrain.lobes.models.transformer.conformer module

Conformer implementation in the SpeechBrain sytle.

Authors * Jianyuan Zhong 2020

Summary

Classes:

ConformerEncoder

This class implements the Conformer encoder.

ConformerEncoderLayer

This is an implementation of Conformer encoder layer.

ConvolutionModule

This is an implementation of convolution module in Conformer.

Reference

class speechbrain.lobes.models.transformer.conformer.ConvolutionModule(input_size, kernel_size, bias=True, activation=<class 'speechbrain.nnet.activations.Swish'>, dropout=0.1)[source]

Bases: torch.nn.modules.module.Module

This is an implementation of convolution module in Conformer.

Parameters
  • input_size (int) – The expected size of the input embedding.

  • dropout (int) – Dropout for the encoder (Optional).

  • bias (bool) – Bias to convolution module.

  • kernel_size (int) – Kernel size of convolution model.

Example

>>> import torch
>>> x = torch.rand((8, 60, 512))
>>> net = ConvolutionModule(512, 3)
>>> output = net(x)
>>> output.shape
torch.Size([8, 60, 512])
forward(x)[source]
training: bool
class speechbrain.lobes.models.transformer.conformer.ConformerEncoderLayer(d_model, d_ffn, nhead, kernel_size, kdim=None, vdim=None, activation=<class 'speechbrain.nnet.activations.Swish'>, bias=True, dropout=0.1)[source]

Bases: torch.nn.modules.module.Module

This is an implementation of Conformer encoder layer.

Parameters
  • d_ffn (int) – Hidden size of self-attention Feed Forward layer.

  • nhead (int) – Number of attention heads.

  • d_model (int) – The expected size of the input embedding.

  • reshape (bool) – Whether to automatically shape 4-d input to 3-d.

  • kdim (int) – Dimension of the key (Optional).

  • vdim (int) – Dimension of the value (Optional).

  • dropout (int) – Dropout for the encoder (Optional).

  • bias (bool) – Bias to convolution module.

  • kernel_size (int) – Kernel size of convolution model.

Example

>>> import torch
>>> x = torch.rand((8, 60, 512))
>>> net = ConformerEncoderLayer(d_ffn=512, nhead=8, d_model=512, kernel_size=3)
>>> output = net(x)
>>> output[0].shape
torch.Size([8, 60, 512])
forward(x, src_mask: Optional[torch.Tensor] = None, src_key_padding_mask: Optional[torch.Tensor] = None)[source]
training: bool
class speechbrain.lobes.models.transformer.conformer.ConformerEncoder(num_layers, nhead, d_ffn, input_shape=None, d_model=None, kdim=None, vdim=None, dropout=0.1, activation=<class 'speechbrain.nnet.activations.Swish'>, kernel_size=31, bias=True)[source]

Bases: torch.nn.modules.module.Module

This class implements the Conformer encoder.

Parameters
  • num_layers (int) – Number of Conformer layers to include.

  • nhead (int) – Number of attention heads.

  • d_ffn (int) – Hidden size of self-attention Feed Forward layer.

  • input_shape (tuple) – Expected shape of an example input.

  • d_model (int) – The dimension of the input embedding.

  • kdim (int) – Dimension for key (Optional).

  • vdim (int) – Dimension for value (Optional).

  • dropout (float) – Dropout for the encoder (Optional).

  • input_module (torch class) – The module to process the source input feature to expected feature dimension (Optional).

Example

>>> import torch
>>> x = torch.rand((8, 60, 512))
>>> net = ConformerEncoder(1, 8, 512, d_model=512)
>>> output, _ = net(x)
>>> output.shape
torch.Size([8, 60, 512])
forward(src, src_mask: Optional[torch.Tensor] = None, src_key_padding_mask: Optional[torch.Tensor] = None)[source]
Parameters
  • src (tensor) – The sequence to the encoder layer (required).

  • src_mask (tensor) – The mask for the src sequence (optional).

  • src_key_padding_mask (tensor) – The mask for the src keys per batch (optional).

training: bool