speechbrain.lobes.models.ContextNet module

The SpeechBrain implementation of ContextNet by https://arxiv.org/pdf/2005.03191.pdf

Authors
  • Jianyuan Zhong 2020

Summary

Classes:

ContextNet

This class implements the ContextNet.

ContextNetBlock

This class implements a block in ContextNet.

SEmodule

This class implements the Squeeze-and-Excitation module.

Reference

class speechbrain.lobes.models.ContextNet.ContextNet(input_shape, out_channels=640, conv_channels=None, kernel_size=3, strides=None, num_blocks=21, num_layers=5, inner_dim=12, alpha=1, beta=1, dropout=0.15, activation=<class 'speechbrain.nnet.activations.Swish'>, se_activation=<class 'torch.nn.modules.activation.Sigmoid'>, norm=<class 'speechbrain.nnet.normalization.BatchNorm1d'>, residuals=None)[source]

Bases: speechbrain.nnet.containers.Sequential

This class implements the ContextNet.

Reference paper: https://arxiv.org/pdf/2005.03191.pdf

Parameters
  • out_channels (int) – Number of output channels of this model (default 640).

  • conv_channels (Optional (list[int])) – Number of output channels for each of the contextnet block. If not provided, it will be initialized as the default setting of above mentioned paper.

  • kernel_size (int) – Kernel size of convolution layers (default 3).

  • strides (Optional (list[int])) – Striding factor for each context block. This stride is applied at the last convolution layer at each context block. If not provided, it will be initialize as the default setting of above paper.

  • num_blocks (int) – Number of context block (default 21).

  • num_layers (int) – Number of depthwise convolution layers for each context block (default 5).

  • inner_dim (int) – Inner dimension of bottle-neck network of the SE Module (default 12).

  • alpha (float) – The factor to scale the output channel of the network (default 1).

  • beta (float) – Beta to scale the Swish activation (default 1).

  • dropout (float) – Dropout (default 0.15).

  • activation (torch class) – Activation function for each context block (default Swish).

  • se_activation (torch class) – Activation function for SE Module (default torch.nn.Sigmoid).

  • norm (torch class) – Normalization to regularize the model (default BatchNorm1d).

  • residuals (Optional (list[bool])) – Whether to apply residual connection at each context block (default None).

Example

>>> inp = torch.randn([8, 48, 40])
>>> block = ContextNet(input_shape=inp.shape, num_blocks=14)
>>> out = block(inp)
>>> out.shape
torch.Size([8, 6, 640])
class speechbrain.lobes.models.ContextNet.SEmodule(input_shape, inner_dim, activation=<class 'torch.nn.modules.activation.Sigmoid'>, norm=<class 'speechbrain.nnet.normalization.BatchNorm1d'>)[source]

Bases: torch.nn.modules.module.Module

This class implements the Squeeze-and-Excitation module.

Parameters
  • inner_dim (int) – Inner dimension of bottle-neck network of the SE Module (default 12).

  • activation (torch class) – Activation function for SE Module (default torch.nn.Sigmoid).

  • norm (torch class) – Normalization to regularize the model (default BatchNorm1d).

Example

>>> inp = torch.randn([8, 120, 40])
>>> net = SEmodule(input_shape=inp.shape, inner_dim=64)
>>> out = net(inp)
>>> out.shape
torch.Size([8, 120, 40])
forward(x)[source]
training: bool
class speechbrain.lobes.models.ContextNet.ContextNetBlock(out_channels, kernel_size, num_layers, inner_dim, input_shape, stride=1, beta=1, dropout=0.15, activation=<class 'speechbrain.nnet.activations.Swish'>, se_activation=<class 'torch.nn.modules.activation.Sigmoid'>, norm=<class 'speechbrain.nnet.normalization.BatchNorm1d'>, residual=True)[source]

Bases: torch.nn.modules.module.Module

This class implements a block in ContextNet.

Parameters
  • out_channels (int) – Number of output channels of this model (default 640).

  • kernel_size (int) – Kernel size of convolution layers (default 3).

  • strides (int) – Striding factor for this context block (default 1).

  • num_layersi (int) – Number of depthwise convolution layers for this context block (default 5).

  • inner_dim (int) – Inner dimension of bottle-neck network of the SE Module (default 12).

  • beta (float) – Beta to scale the Swish activation (default 1).

  • dropout (float) – Dropout (default 0.15).

  • activation (torch class) – Activation function for this context block (default Swish).

  • se_activation (torch class) – Activation function for SE Module (default torch.nn.Sigmoid).

  • norm (torch class) – Normalization to regularize the model (default BatchNorm1d).

  • residuals (bool) – Whether to apply residual connection at this context block (default None).

Example

>>> inp = torch.randn([8, 120, 40])
>>> block = ContextNetBlock(256, 3, 5, 12, input_shape=inp.shape, stride=2)
>>> out = block(inp)
>>> out.shape
torch.Size([8, 60, 256])
forward(x)[source]
training: bool