speechbrain.lobes.models.transformer.TransformerSE module

CNN Transformer model for SE in the SpeechBrain style.

Authors * Chien-Feng Liao 2020

Summary

Classes:

CNNTransformerSE

This is an implementation of transformer model with CNN pre-encoder for SE.

Reference

class speechbrain.lobes.models.transformer.TransformerSE.CNNTransformerSE(d_model, output_size, output_activation=<class 'torch.nn.modules.activation.ReLU'>, nhead=8, num_layers=8, d_ffn=512, dropout=0.1, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, causal=True, custom_emb_module=None, normalize_before=False)[source]

Bases: speechbrain.lobes.models.transformer.Transformer.TransformerInterface

This is an implementation of transformer model with CNN pre-encoder for SE.

Parameters
  • d_model (int) – The number of expected features in the encoder inputs.

  • output_size (int) – The number of neurons in the output layer.

  • output_activation (torch class) – The activation function of the output layer (default=ReLU).

  • nhead (int) – The number of heads in the multi-head attention models (default=8).

  • num_layers (int) – The number of sub-layers in the transformer (default=8).

  • d_ffn (int) – The number of expected features in the encoder layers (default=512).

  • dropout (int) – The dropout value (default=0.1).

  • activation (torch class) – The activation function of intermediate layers (default=LeakyReLU).

  • causal (bool) – True for causal setting, the model is forbidden to see future frames (default=True).

  • custom_emb_module (torch class) – Module that processes the input features before the transformer model.

Example

>>> src = torch.rand([8, 120, 256])
>>> net = CNNTransformerSE(d_model=256, output_size=257)
>>> out = net(src)
>>> out.shape
torch.Size([8, 120, 257])
forward(x, src_key_padding_mask=None)[source]
training: bool