speechbrain.lobes.models.ESPnetVGG module

This lobes replicate the encoder first introduced in ESPNET v1

source: https://github.com/espnet/espnet/blob/master/espnet/nets/pytorch_backend/rnn/encoders.py

  • Titouan Parcollet 2020




This model is a combination of CNNs and RNNs following


class speechbrain.lobes.models.ESPnetVGG.ESPnetVGG(input_shape, activation=<class 'torch.nn.modules.activation.ReLU'>, dropout=0.15, cnn_channels=[64, 128], rnn_class=<class 'speechbrain.nnet.RNN.LSTM'>, rnn_layers=4, rnn_neurons=512, rnn_bidirectional=True, rnn_re_init=False, projection_neurons=512)[source]

Bases: speechbrain.nnet.containers.Sequential

This model is a combination of CNNs and RNNs following

the ESPnet encoder. (VGG+RNN+MLP+tanh())

  • input_shape (tuple) – The shape of an example expected input.

  • activation (torch class) – A class used for constructing the activation layers. For CNN and DNN.

  • dropout (float) – Neuron dropout rate, applied to RNN only.

  • cnn_channels (list of ints) – A list of the number of output channels for each CNN block.

  • rnn_class (torch class) – The type of RNN to use (LiGRU, LSTM, GRU, RNN)

  • rnn_layers (int) – The number of recurrent layers to include.

  • rnn_neurons (int) – Number of neurons in each layer of the RNN.

  • rnn_bidirectional (bool) – Whether this model will process just forward or both directions.

  • projection_neurons (int) – The number of neurons in the last linear layer.


>>> inputs = torch.rand([10, 40, 60])
>>> model = ESPnetVGG(input_shape=inputs.shape)
>>> outputs = model(inputs)
>>> outputs.shape
torch.Size([10, 10, 512])
training: bool