speechbrain.lobes.models.CRDNN module

A combination of Convolutional, Recurrent, and Fully-connected networks.

Authors

Mirco Ravanelli 2020
Peter Plantinga 2020
Ju-Chieh Chou 2020
Titouan Parcollet 2020
Abdel 2020

Summary

Classes:

`CNN_Block`	CNN Block, based on VGG blocks.
`CRDNN`	This model is a combination of CNNs, RNNs, and DNNs.
`DNN_Block`	Block for linear layers.

Reference

class speechbrain.lobes.models.CRDNN.CRDNN(input_size=None, input_shape=None, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, dropout=0.15, cnn_blocks=2, cnn_channels=[128, 256], cnn_kernelsize=(3, 3), time_pooling=False, time_pooling_size=2, freq_pooling_size=2, rnn_class=<class 'speechbrain.nnet.RNN.LiGRU'>, inter_layer_pooling_size=[2, 2], using_2d_pooling=False, rnn_layers=4, rnn_neurons=512, rnn_bidirectional=True, rnn_re_init=False, dnn_blocks=2, dnn_neurons=512, projection_dim=-1, use_rnnp=False)[source]

Bases: Sequential

This model is a combination of CNNs, RNNs, and DNNs.

This model expects 3-dimensional input [batch, time, feats] and by default produces output of the size [batch, time, dnn_neurons].

One exception is if using_2d_pooling or time_pooling is True. In this case, the time dimension will be downsampled.

Parameters:

input_size (int) – The length of the expected input at the third dimension.
input_shape (tuple) – While input_size will suffice, this option can allow putting CRDNN into a sequential with other classes.
activation (torch class) – A class used for constructing the activation layers for CNN and DNN.
dropout (float) – Neuron dropout rate as applied to CNN, RNN, and DNN.
cnn_blocks (int) – The number of convolutional neural blocks to include.
cnn_channels (list of ints) – A list of the number of output channels for each CNN block.
cnn_kernelsize (tuple of ints) – The size of the convolutional kernels.
time_pooling (bool) – Whether to pool the utterance on the time axis before the RNN.
time_pooling_size (int) – The number of elements to pool on the time axis.
time_pooling_stride (int) – The number of elements to increment by when iterating the time axis.
using_2d_pooling (bool) – Whether using a 2D or 1D pooling after each CNN block.
inter_layer_pooling_size (list of ints) – A list of the pooling sizes for each CNN block.
rnn_class (torch class) – The type of RNN to use in CRDNN network (LiGRU, LSTM, GRU, RNN)
rnn_layers (int) – The number of recurrent RNN layers to include.
rnn_neurons (int) – Number of neurons in each layer of the RNN.
rnn_bidirectional (bool) – Whether this model will process just forward or in both directions.
rnn_re_init (bool,) – If True, an orthogonal initialization will be applied to the recurrent weights.
dnn_blocks (int) – The number of linear neural blocks to include.
dnn_neurons (int) – The number of neurons in the linear layers.
use_rnnp (bool) – If True, a linear projection layer is added between RNN layers.
projection_dim (int) – The number of neurons in the projection layer. This layer is used to reduce the size of the flattened representation obtained after the CNN blocks.

Example

>>> inputs = torch.rand([10, 15, 60])
>>> model = CRDNN(input_shape=inputs.shape)
>>> outputs = model(inputs)
>>> outputs.shape
torch.Size([10, 15, 512])

class speechbrain.lobes.models.CRDNN.CNN_Block(input_shape, channels, kernel_size=[3, 3], activation=<class 'torch.nn.modules.activation.LeakyReLU'>, using_2d_pool=False, pooling_size=2, dropout=0.15)[source]

Bases: Sequential

CNN Block, based on VGG blocks.

Parameters:

input_shape (tuple) – Expected shape of the input.
channels (int) – Number of convolutional channels for the block.
kernel_size (tuple) – Size of the 2d convolutional kernel
activation (torch.nn.Module class) – A class to be used for instantiating an activation layer.
using_2d_pool (bool) – Whether to use 2d pooling or only 1d pooling.
pooling_size (int) – Size of pooling kernel, duplicated for 2d pooling.
dropout (float) – Rate to use for dropping channels.

Example

>>> inputs = torch.rand(10, 15, 60)
>>> block = CNN_Block(input_shape=inputs.shape, channels=32)
>>> outputs = block(inputs)
>>> outputs.shape
torch.Size([10, 15, 30, 32])

class speechbrain.lobes.models.CRDNN.DNN_Block(input_shape, neurons, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, dropout=0.15)[source]

Bases: Sequential

Block for linear layers.

Parameters:

input_shape (tuple) – Expected shape of the input.
neurons (int) – Size of the linear layers.
activation (torch.nn.Module class) – Class definition to use for constructing activation layers.
dropout (float) – Rate to use for dropping neurons.

Example

>>> inputs = torch.rand(10, 15, 128)
>>> block = DNN_Block(input_shape=inputs.shape, neurons=64)
>>> outputs = block(inputs)
>>> outputs.shape
torch.Size([10, 15, 64])