speechbrain.lobes.models.CRDNN module

A combination of Convolutional, Recurrent, and Fully-connected networks.

Authors
  • Mirco Ravanelli 2020

  • Peter Plantinga 2020

  • Ju-Chieh Chou 2020

  • Titouan Parcollet 2020

  • Abdel 2020

Summary

Classes:

CNN_Block

CNN Block, based on VGG blocks.

CRDNN

This model is a combination of CNNs, RNNs, and DNNs.

DNN_Block

Block for linear layers.

Reference

class speechbrain.lobes.models.CRDNN.CRDNN(input_size=None, input_shape=None, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, dropout=0.15, cnn_blocks=2, cnn_channels=[128, 256], cnn_kernelsize=(3, 3), time_pooling=False, time_pooling_size=2, freq_pooling_size=2, rnn_class=<class 'speechbrain.nnet.RNN.LiGRU'>, inter_layer_pooling_size=[2, 2], using_2d_pooling=False, rnn_layers=4, rnn_neurons=512, rnn_bidirectional=True, rnn_re_init=False, dnn_blocks=2, dnn_neurons=512, projection_dim=-1, use_rnnp=False)[source]

Bases: speechbrain.nnet.containers.Sequential

This model is a combination of CNNs, RNNs, and DNNs.

This model expects 3-dimensional input [batch, time, feats] and by default produces output of the size [batch, time, dnn_neurons].

One exception is if using_2d_pooling or time_pooling is True. In this case, the time dimension will be downsampled.

Parameters
  • input_size (int) – The length of the expected input at the third dimension.

  • input_shape (tuple) – While input_size will suffice, this option can allow putting CRDNN into a sequential with other classes.

  • activation (torch class) – A class used for constructing the activation layers for CNN and DNN.

  • dropout (float) – Neuron dropout rate as applied to CNN, RNN, and DNN.

  • cnn_blocks (int) – The number of convolutional neural blocks to include.

  • cnn_channels (list of ints) – A list of the number of output channels for each CNN block.

  • cnn_kernelsize (tuple of ints) – The size of the convolutional kernels.

  • time_pooling (bool) – Whether to pool the utterance on the time axis before the RNN.

  • time_pooling_size (int) – The number of elements to pool on the time axis.

  • time_pooling_stride (int) – The number of elements to increment by when iterating the time axis.

  • using_2d_pooling (bool) – Whether using a 2D or 1D pooling after each CNN block.

  • inter_layer_pooling_size (list of ints) – A list of the pooling sizes for each CNN block.

  • rnn_class (torch class) – The type of RNN to use in CRDNN network (LiGRU, LSTM, GRU, RNN)

  • rnn_layers (int) – The number of recurrent RNN layers to include.

  • rnn_neurons (int) – Number of neurons in each layer of the RNN.

  • rnn_bidirectional (bool) – Whether this model will process just forward or in both directions.

  • rnn_re_init (bool,) – If True, an orthogonal initialization will be applied to the recurrent weights.

  • dnn_blocks (int) – The number of linear neural blocks to include.

  • dnn_neurons (int) – The number of neurons in the linear layers.

  • use_rnnp (bool) – If True, a linear projection layer is added between RNN layers.

  • projection_dim (int) – The number of neurons in the projection layer. This layer is used to reduce the size of the flattened representation obtained after the CNN blocks.

Example

>>> inputs = torch.rand([10, 15, 60])
>>> model = CRDNN(input_shape=inputs.shape)
>>> outputs = model(inputs)
>>> outputs.shape
torch.Size([10, 15, 512])
class speechbrain.lobes.models.CRDNN.CNN_Block(input_shape, channels, kernel_size=[3, 3], activation=<class 'torch.nn.modules.activation.LeakyReLU'>, using_2d_pool=False, pooling_size=2, dropout=0.15)[source]

Bases: speechbrain.nnet.containers.Sequential

CNN Block, based on VGG blocks.

Parameters
  • input_shape (tuple) – Expected shape of the input.

  • channels (int) – Number of convolutional channels for the block.

  • kernel_size (tuple) – Size of the 2d convolutional kernel

  • activation (torch.nn.Module class) – A class to be used for instantiating an activation layer.

  • using_2d_pool (bool) – Whether to use 2d pooling or only 1d pooling.

  • pooling_size (int) – Size of pooling kernel, duplicated for 2d pooling.

  • dropout (float) – Rate to use for dropping channels.

Example

>>> inputs = torch.rand(10, 15, 60)
>>> block = CNN_Block(input_shape=inputs.shape, channels=32)
>>> outputs = block(inputs)
>>> outputs.shape
torch.Size([10, 15, 30, 32])
class speechbrain.lobes.models.CRDNN.DNN_Block(input_shape, neurons, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, dropout=0.15)[source]

Bases: speechbrain.nnet.containers.Sequential

Block for linear layers.

Parameters
  • input_shape (tuple) – Expected shape of the input.

  • neurons (int) – Size of the linear layers.

  • activation (torch.nn.Module class) – Class definition to use for constructing activation layers.

  • dropout (float) – Rate to use for dropping neurons.

Example

>>> inputs = torch.rand(10, 15, 128)
>>> block = DNN_Block(input_shape=inputs.shape, neurons=64)
>>> outputs = block(inputs)
>>> outputs.shape
torch.Size([10, 15, 64])