speechbrain.lobes.models.ESPnetVGG moduleο
This lobes replicate the encoder first introduced in ESPNET v1
source: https://github.com/espnet/espnet/blob/master/espnet/nets/pytorch_backend/rnn/encoders.py
- Authors
Titouan Parcollet 2020
Summaryο
Classes:
This model is a combination of CNNs and RNNs following |
Referenceο
- class speechbrain.lobes.models.ESPnetVGG.ESPnetVGG(input_shape, activation=<class 'torch.nn.modules.activation.ReLU'>, dropout=0.15, cnn_channels=[64, 128], rnn_class=<class 'speechbrain.nnet.RNN.LSTM'>, rnn_layers=4, rnn_neurons=512, rnn_bidirectional=True, rnn_re_init=False, projection_neurons=512)[source]ο
Bases:
Sequential
- This model is a combination of CNNs and RNNs following
the ESPnet encoder. (VGG+RNN+MLP+tanh())
- Parameters:
input_shape (tuple) β The shape of an example expected input.
activation (torch class) β A class used for constructing the activation layers. For CNN and DNN.
dropout (float) β Neuron dropout rate, applied to RNN only.
cnn_channels (list of ints) β A list of the number of output channels for each CNN block.
rnn_class (torch class) β The type of RNN to use (LiGRU, LSTM, GRU, RNN)
rnn_layers (int) β The number of recurrent layers to include.
rnn_neurons (int) β Number of neurons in each layer of the RNN.
rnn_bidirectional (bool) β Whether this model will process just forward or both directions.
rnn_re_init (bool)
projection_neurons (int) β The number of neurons in the last linear layer.
Example
>>> inputs = torch.rand([10, 40, 60]) >>> model = ESPnetVGG(input_shape=inputs.shape) >>> outputs = model(inputs) >>> outputs.shape torch.Size([10, 10, 512])