speechbrain.lobes.models.CRDNN module
A combination of Convolutional, Recurrent, and Fully-connected networks.
- Authors
Mirco Ravanelli 2020
Peter Plantinga 2020
Ju-Chieh Chou 2020
Titouan Parcollet 2020
Abdel 2020
Summary
Classes:
CNN Block, based on VGG blocks. |
|
This model is a combination of CNNs, RNNs, and DNNs. |
|
Block for linear layers. |
Reference
- class speechbrain.lobes.models.CRDNN.CRDNN(input_size=None, input_shape=None, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, dropout=0.15, cnn_blocks=2, cnn_channels=[128, 256], cnn_kernelsize=(3, 3), time_pooling=False, time_pooling_size=2, freq_pooling_size=2, rnn_class=<class 'speechbrain.nnet.RNN.LiGRU'>, inter_layer_pooling_size=[2, 2], using_2d_pooling=False, rnn_layers=4, rnn_neurons=512, rnn_bidirectional=True, rnn_re_init=False, dnn_blocks=2, dnn_neurons=512, projection_dim=-1, use_rnnp=False)[source]
Bases:
Sequential
This model is a combination of CNNs, RNNs, and DNNs.
This model expects 3-dimensional input [batch, time, feats] and by default produces output of the size [batch, time, dnn_neurons].
One exception is if
using_2d_pooling
ortime_pooling
is True. In this case, the time dimension will be downsampled.- Parameters
input_size (int) – The length of the expected input at the third dimension.
input_shape (tuple) – While input_size will suffice, this option can allow putting CRDNN into a sequential with other classes.
activation (torch class) – A class used for constructing the activation layers for CNN and DNN.
dropout (float) – Neuron dropout rate as applied to CNN, RNN, and DNN.
cnn_blocks (int) – The number of convolutional neural blocks to include.
cnn_channels (list of ints) – A list of the number of output channels for each CNN block.
cnn_kernelsize (tuple of ints) – The size of the convolutional kernels.
time_pooling (bool) – Whether to pool the utterance on the time axis before the RNN.
time_pooling_size (int) – The number of elements to pool on the time axis.
time_pooling_stride (int) – The number of elements to increment by when iterating the time axis.
using_2d_pooling (bool) – Whether using a 2D or 1D pooling after each CNN block.
inter_layer_pooling_size (list of ints) – A list of the pooling sizes for each CNN block.
rnn_class (torch class) – The type of RNN to use in CRDNN network (LiGRU, LSTM, GRU, RNN)
rnn_layers (int) – The number of recurrent RNN layers to include.
rnn_neurons (int) – Number of neurons in each layer of the RNN.
rnn_bidirectional (bool) – Whether this model will process just forward or in both directions.
rnn_re_init (bool,) – If True, an orthogonal initialization will be applied to the recurrent weights.
dnn_blocks (int) – The number of linear neural blocks to include.
dnn_neurons (int) – The number of neurons in the linear layers.
use_rnnp (bool) – If True, a linear projection layer is added between RNN layers.
projection_dim (int) – The number of neurons in the projection layer. This layer is used to reduce the size of the flattened representation obtained after the CNN blocks.
Example
>>> inputs = torch.rand([10, 15, 60]) >>> model = CRDNN(input_shape=inputs.shape) >>> outputs = model(inputs) >>> outputs.shape torch.Size([10, 15, 512])
- class speechbrain.lobes.models.CRDNN.CNN_Block(input_shape, channels, kernel_size=[3, 3], activation=<class 'torch.nn.modules.activation.LeakyReLU'>, using_2d_pool=False, pooling_size=2, dropout=0.15)[source]
Bases:
Sequential
CNN Block, based on VGG blocks.
- Parameters
input_shape (tuple) – Expected shape of the input.
channels (int) – Number of convolutional channels for the block.
kernel_size (tuple) – Size of the 2d convolutional kernel
activation (torch.nn.Module class) – A class to be used for instantiating an activation layer.
using_2d_pool (bool) – Whether to use 2d pooling or only 1d pooling.
pooling_size (int) – Size of pooling kernel, duplicated for 2d pooling.
dropout (float) – Rate to use for dropping channels.
Example
>>> inputs = torch.rand(10, 15, 60) >>> block = CNN_Block(input_shape=inputs.shape, channels=32) >>> outputs = block(inputs) >>> outputs.shape torch.Size([10, 15, 30, 32])
- class speechbrain.lobes.models.CRDNN.DNN_Block(input_shape, neurons, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, dropout=0.15)[source]
Bases:
Sequential
Block for linear layers.
- Parameters
Example
>>> inputs = torch.rand(10, 15, 128) >>> block = DNN_Block(input_shape=inputs.shape, neurons=64) >>> outputs = block(inputs) >>> outputs.shape torch.Size([10, 15, 64])