speechbrain.nnet.RNN module

Library implementing recurrent neural networks.

Authors
  • Adel Moumen 2023

  • Mirco Ravanelli 2020

  • Ju-Chieh Chou 2020

  • Jianyuan Zhong 2020

  • Loren Lugosch 2020

Summary

Classes:

AttentionalRNNDecoder

This function implements RNN decoder model with attention.

GRU

This function implements a basic GRU.

GRUCell

This class implements a basic GRU Cell for a timestep of input, while GRU() takes the whole sequence as input.

LSTM

This function implements a basic LSTM.

LSTMCell

This class implements a basic LSTM Cell for a timestep of input, while LSTM() takes the whole sequence as input.

LiGRU

This function implements a Light GRU (Li-GRU).

LiGRU_Layer

This class implements Light-Gated Recurrent Units (Li-GRU) layer.

QuasiRNN

This is a implementation for the Quasi-RNN.

QuasiRNNLayer

Applies a single layer Quasi-Recurrent Neural Network (QRNN) to an input sequence.

RNN

This function implements a vanilla RNN.

RNNCell

This class implements a basic RNN Cell for a timestep of input, while RNN() takes the whole sequence as input.

SLiGRU

This class implements a Stabilised Light GRU (SLi-GRU).

SLiGRU_Layer

This class implements a Stabilised Light-Gated Recurrent Units (SLi-GRU) layer.

Functions:

pack_padded_sequence

Returns packed speechbrain-formatted tensors.

pad_packed_sequence

Returns speechbrain-formatted tensor from packed sequences.

rnn_init

This function is used to initialize the RNN weight.

Reference

speechbrain.nnet.RNN.pack_padded_sequence(inputs, lengths)[source]

Returns packed speechbrain-formatted tensors.

Parameters:
speechbrain.nnet.RNN.pad_packed_sequence(inputs)[source]

Returns speechbrain-formatted tensor from packed sequences.

Parameters:

inputs (torch.nn.utils.rnn.PackedSequence) – An input set of sequences to convert to a tensor.

class speechbrain.nnet.RNN.RNN(hidden_size, input_shape=None, input_size=None, nonlinearity='relu', num_layers=1, bias=True, dropout=0.0, re_init=True, bidirectional=False)[source]

Bases: Module

This function implements a vanilla RNN.

It accepts in input tensors formatted as (batch, time, fea). In the case of 4d inputs like (batch, time, fea, channel) the tensor is flattened as (batch, time, fea*channel).

Parameters:
  • hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output). values (i.e, time and frequency kernel sizes respectively).

  • input_shape (tuple) – The shape of an example input. Alternatively, use input_size.

  • input_size (int) – The size of the input. Alternatively, use input_shape.

  • nonlinearity (str) – Type of nonlinearity (tanh, relu).

  • num_layers (int) – Number of layers to employ in the RNN architecture.

  • bias (bool) – If True, the additive bias b is adopted.

  • dropout (float) – It is the dropout factor (must be between 0 and 1).

  • re_init (bool) – If True, orthogonal initialization is used for the recurrent weights. Xavier initialization is used for the input connection weights.

  • bidirectional (bool) – If True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used.

Example

>>> inp_tensor = torch.rand([4, 10, 20])
>>> net = RNN(hidden_size=5, input_shape=inp_tensor.shape)
>>> out_tensor, _ = net(inp_tensor)
>>>
torch.Size([4, 10, 5])
forward(x, hx=None, lengths=None)[source]

Returns the output of the vanilla RNN.

Parameters:
training: bool
class speechbrain.nnet.RNN.LSTM(hidden_size, input_shape=None, input_size=None, num_layers=1, bias=True, dropout=0.0, re_init=True, bidirectional=False)[source]

Bases: Module

This function implements a basic LSTM.

It accepts in input tensors formatted as (batch, time, fea). In the case of 4d inputs like (batch, time, fea, channel) the tensor is flattened as (batch, time, fea*channel).

Parameters:
  • hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output). values (i.e, time and frequency kernel sizes respectively).

  • input_shape (tuple) – The shape of an example input. Alternatively, use input_size.

  • input_size (int) – The size of the input. Alternatively, use input_shape.

  • num_layers (int) – Number of layers to employ in the RNN architecture.

  • bias (bool) – If True, the additive bias b is adopted.

  • dropout (float) – It is the dropout factor (must be between 0 and 1).

  • re_init (bool) – It True, orthogonal initialization is used for the recurrent weights. Xavier initialization is used for the input connection weights.

  • bidirectional (bool) – If True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used.

Example

>>> inp_tensor = torch.rand([4, 10, 20])
>>> net = LSTM(hidden_size=5, input_shape=inp_tensor.shape)
>>> out_tensor = net(inp_tensor)
>>>
torch.Size([4, 10, 5])
forward(x, hx=None, lengths=None)[source]

Returns the output of the LSTM.

Parameters:
training: bool
class speechbrain.nnet.RNN.GRU(hidden_size, input_shape=None, input_size=None, num_layers=1, bias=True, dropout=0.0, re_init=True, bidirectional=False)[source]

Bases: Module

This function implements a basic GRU.

It accepts input tensors formatted as (batch, time, fea). In the case of 4d inputs like (batch, time, fea, channel) the tensor is flattened as (batch, time, fea*channel).

Parameters:
  • hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output). values (i.e, time and frequency kernel sizes respectively).

  • input_shape (tuple) – The shape of an example input. Alternatively, use input_size.

  • input_size (int) – The size of the input. Alternatively, use input_shape.

  • num_layers (int) – Number of layers to employ in the RNN architecture.

  • bias (bool) – If True, the additive bias b is adopted.

  • t (dropou) – It is the dropout factor (must be between 0 and 1).

  • re_init (bool) – If True, orthogonal initialization is used for the recurrent weights. Xavier initialization is used for the input connection weights.

  • bidirectional (bool) – If True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used.

Example

>>> inp_tensor = torch.rand([4, 10, 20])
>>> net = GRU(hidden_size=5, input_shape=inp_tensor.shape)
>>> out_tensor, _ = net(inp_tensor)
>>>
torch.Size([4, 10, 5])
forward(x, hx=None, lengths=None)[source]

Returns the output of the GRU.

Parameters:
training: bool
class speechbrain.nnet.RNN.RNNCell(hidden_size, input_shape=None, input_size=None, num_layers=1, bias=True, dropout=0.0, re_init=True, nonlinearity='tanh')[source]

Bases: Module

This class implements a basic RNN Cell for a timestep of input, while RNN() takes the whole sequence as input.

It is designed for an autoregressive decoder (ex. attentional decoder), which takes one input at a time. Using torch.nn.RNNCell() instead of torch.nn.RNN() to reduce VRAM consumption.

It accepts in input tensors formatted as (batch, fea).

Parameters:
  • hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output).

  • input_shape (tuple) – The shape of an example input. Alternatively, use input_size.

  • input_size (int) – The size of the input. Alternatively, use input_shape.

  • num_layers (int) – Number of layers to employ in the RNN architecture.

  • bias (bool) – If True, the additive bias b is adopted.

  • dropout (float) – It is the dropout factor (must be between 0 and 1).

  • re_init (bool) – It True, orthogonal initialization is used for the recurrent weights. Xavier initialization is used for the input connection weights.

Example

>>> inp_tensor = torch.rand([4, 20])
>>> net = RNNCell(hidden_size=5, input_shape=inp_tensor.shape)
>>> out_tensor, _ = net(inp_tensor)
>>> out_tensor.shape
torch.Size([4, 5])
forward(x, hx=None)[source]

Returns the output of the RNNCell.

Parameters:
training: bool
class speechbrain.nnet.RNN.GRUCell(hidden_size, input_shape=None, input_size=None, num_layers=1, bias=True, dropout=0.0, re_init=True)[source]

Bases: Module

This class implements a basic GRU Cell for a timestep of input, while GRU() takes the whole sequence as input.

It is designed for an autoregressive decoder (ex. attentional decoder), which takes one input at a time. Using torch.nn.GRUCell() instead of torch.nn.GRU() to reduce VRAM consumption. It accepts in input tensors formatted as (batch, fea).

Parameters:
  • hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output).

  • input_shape (tuple) – The shape of an example input. Alternatively, use input_size.

  • input_size (int) – The size of the input. Alternatively, use input_shape.

  • num_layers (int) – Number of layers to employ in the GRU architecture.

  • bias (bool) – If True, the additive bias b is adopted.

  • dropout (float) – It is the dropout factor (must be between 0 and 1).

  • re_init (bool) – It True, orthogonal initialization is used for the recurrent weights. Xavier initialization is used for the input connection weights.

Example

>>> inp_tensor = torch.rand([4, 20])
>>> net = GRUCell(hidden_size=5, input_shape=inp_tensor.shape)
>>> out_tensor, _ = net(inp_tensor)
>>> out_tensor.shape
torch.Size([4, 5])
forward(x, hx=None)[source]

Returns the output of the GRUCell.

Parameters:
training: bool
class speechbrain.nnet.RNN.LSTMCell(hidden_size, input_shape=None, input_size=None, num_layers=1, bias=True, dropout=0.0, re_init=True)[source]

Bases: Module

This class implements a basic LSTM Cell for a timestep of input, while LSTM() takes the whole sequence as input.

It is designed for an autoregressive decoder (ex. attentional decoder), which takes one input at a time. Using torch.nn.LSTMCell() instead of torch.nn.LSTM() to reduce VRAM consumption. It accepts in input tensors formatted as (batch, fea).

Parameters:
  • hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output).

  • input_shape (tuple) – The shape of an example input. Alternatively, use input_size.

  • input_size (int) – The size of the input. Alternatively, use input_shape.

  • num_layers (int) – Number of layers to employ in the LSTM architecture.

  • bias (bool) – If True, the additive bias b is adopted.

  • dropout (float) – It is the dropout factor (must be between 0 and 1).

  • re_init (bool) – If True, orthogonal initialization is used for the recurrent weights. Xavier initialization is used for the input connection weights.

Example

>>> inp_tensor = torch.rand([4, 20])
>>> net = LSTMCell(hidden_size=5, input_shape=inp_tensor.shape)
>>> out_tensor, _ = net(inp_tensor)
>>> out_tensor.shape
torch.Size([4, 5])
forward(x, hx=None)[source]

Returns the output of the LSTMCell.

Parameters:
training: bool
class speechbrain.nnet.RNN.AttentionalRNNDecoder(rnn_type, attn_type, hidden_size, attn_dim, num_layers, enc_dim, input_size, nonlinearity='relu', re_init=True, normalization='batchnorm', scaling=1.0, channels=None, kernel_size=None, bias=True, dropout=0.0)[source]

Bases: Module

This function implements RNN decoder model with attention.

This function implements different RNN models. It accepts in enc_states tensors formatted as (batch, time, fea). In the case of 4d inputs like (batch, time, fea, channel) the tensor is flattened in this way: (batch, time, fea*channel).

Parameters:
  • rnn_type (str) – Type of recurrent neural network to use (rnn, lstm, gru).

  • attn_type (str) – type of attention to use (location, content).

  • hidden_size (int) – Number of the neurons.

  • attn_dim (int) – Number of attention module internal and output neurons.

  • num_layers (int) – Number of layers to employ in the RNN architecture.

  • input_shape (tuple) – Expected shape of an input.

  • input_size (int) – Expected size of the relevant input dimension.

  • nonlinearity (str) – Type of nonlinearity (tanh, relu). This option is active for rnn and ligru models only. For lstm and gru tanh is used.

  • re_init (bool) – It True, orthogonal init is used for the recurrent weights. Xavier initialization is used for the input connection weights.

  • normalization (str) – Type of normalization for the ligru model (batchnorm, layernorm). Every string different from batchnorm and layernorm will result in no normalization.

  • scaling (float) – A scaling factor to sharpen or smoothen the attention distribution.

  • channels (int) – Number of channels for location-aware attention.

  • kernel_size (int) – Size of the kernel for location-aware attention.

  • bias (bool) – If True, the additive bias b is adopted.

  • dropout (float) – It is the dropout factor (must be between 0 and 1).

Example

>>> enc_states = torch.rand([4, 10, 20])
>>> wav_len = torch.rand([4])
>>> inp_tensor = torch.rand([4, 5, 6])
>>> net = AttentionalRNNDecoder(
...     rnn_type="lstm",
...     attn_type="content",
...     hidden_size=7,
...     attn_dim=5,
...     num_layers=1,
...     enc_dim=20,
...     input_size=6,
... )
>>> out_tensor, attn = net(inp_tensor, enc_states, wav_len)
>>> out_tensor.shape
torch.Size([4, 5, 7])
forward_step(inp, hs, c, enc_states, enc_len)[source]

One step of forward pass process.

Parameters:
  • inp (torch.Tensor) – The input of current timestep.

  • hs (torch.Tensor or tuple of torch.Tensor) – The cell state for RNN.

  • c (torch.Tensor) – The context vector of previous timestep.

  • enc_states (torch.Tensor) – The tensor generated by encoder, to be attended.

  • enc_len (torch.LongTensor) – The actual length of encoder states.

Returns:

  • dec_out (torch.Tensor) – The output tensor.

  • hs (torch.Tensor or tuple of torch.Tensor) – The new cell state for RNN.

  • c (torch.Tensor) – The context vector of the current timestep.

  • w (torch.Tensor) – The weight of attention.

forward(inp_tensor, enc_states, wav_len)[source]

This method implements the forward pass of the attentional RNN decoder.

Parameters:
  • inp_tensor (torch.Tensor) – The input tensor for each timesteps of RNN decoder.

  • enc_states (torch.Tensor) – The tensor to be attended by the decoder.

  • wav_len (torch.Tensor) – This variable stores the relative length of wavform.

Returns:

  • outputs (torch.Tensor) – The output of the RNN decoder.

  • attn (torch.Tensor) – The attention weight of each timestep.

training: bool
class speechbrain.nnet.RNN.LiGRU(hidden_size, input_shape, nonlinearity='relu', normalization='batchnorm', num_layers=1, bias=True, dropout=0.0, re_init=True, bidirectional=False)[source]

Bases: Module

This function implements a Light GRU (Li-GRU).

Li-GRU is single-gate GRU model based on batch-norm + relu activations + recurrent dropout. For more info see:

“M. Ravanelli, P. Brakel, M. Omologo, Y. Bengio, Light Gated Recurrent Units for Speech Recognition, in IEEE Transactions on Emerging Topics in Computational Intelligence, 2018” (https://arxiv.org/abs/1803.10225)

If you face instabilities during training, instead use the Stabilised Li-GRU (SLi-GRU). See:

  • speechbrain.nnet.RNN.SLiGRU

To improve the speed of the model, it is recommended to use the torch just-in-time compiler (jit) right before using it or you can use the custom implementation (CUDA+PyTorch) that is available at https://github.com/Adel-Moumen/fast_ligru.

You can compile it with: compiled_model = torch.jit.script(model)

It accepts in input tensors formatted as (batch, time, fea). In the case of 4d inputs like (batch, time, fea, channel) the tensor is flattened as (batch, time, fea*channel).

Parameters:
  • hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output). values (i.e, time and frequency kernel sizes respectively).

  • input_shape (tuple) – The shape of an example input.

  • nonlinearity (str) – Type of nonlinearity (tanh, relu).

  • normalization (str) – Type of normalization for the ligru model (batchnorm, layernorm). Every string different from batchnorm and layernorm will result in no normalization.

  • num_layers (int) – Number of layers to employ in the RNN architecture.

  • bias (bool) – If True, the additive bias b is adopted.

  • dropout (float) – It is the dropout factor (must be between 0 and 1).

  • re_init (bool) – If True, orthogonal initialization is used for the recurrent weights. Xavier initialization is used for the input connection weights.

  • bidirectional (bool) – If True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used.

Example

>>> inp_tensor = torch.rand([4, 10, 20])
>>> net = LiGRU(input_shape=inp_tensor.shape, hidden_size=5)
>>> out_tensor, _ = net(inp_tensor)
>>>
torch.Size([4, 10, 5])
forward(x, hx: Tensor | None = None)[source]

Returns the output of the Li-GRU.

Parameters:
training: bool
class speechbrain.nnet.RNN.LiGRU_Layer(input_size, hidden_size, num_layers, batch_size, dropout=0.0, nonlinearity='relu', normalization='batchnorm', bias=True, bidirectional=False)[source]

Bases: Module

This class implements Light-Gated Recurrent Units (Li-GRU) layer.

Parameters:
  • input_size (int) – Feature dimensionality of the input tensors.

  • hidden_size (int) – Number of output neurons.

  • num_layers (int) – The layer number.

  • batch_size (int) – Batch size of the input tensors.

  • dropout (float) – It is the dropout factor (must be between 0 and 1).

  • nonlinearity (str) – Type of nonlinearity (tanh, sin, leaky_relu, relu).

  • normalization (str) – Type of normalization (batchnorm, layernorm). Every string different from batchnorm and layernorm will result in layer normalization.

  • bias (bool) – If True, the additive bias b is adopted.

  • bidirectional (bool) – if True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used.

forward(x: Tensor, hx: Tensor | None = None) Tensor[source]

Returns the output of the liGRU layer.

Parameters:
training: bool
class speechbrain.nnet.RNN.SLiGRU(hidden_size, input_shape, nonlinearity='relu', ff_normalization='batchnorm', recurrent_elementwise_affine=False, num_layers=1, bias=True, dropout=0.0, re_init=True, bidirectional=False)[source]

Bases: Module

This class implements a Stabilised Light GRU (SLi-GRU).

SLi-GRU is single-gate GRU model based on batch-norm + relu activations + layer-norm on the recurrent connections + recurrent dropout.

The SLi-GRU differs from the vanilla Li-GRU on the recurrent weights. Indeed, the Li-GRU suffers from an exploding gradient problem on the recurrent weights, and cannot be trained on medium to large ASR dataset. To solve this problem, we use a layer-norm on the recurrent weights that stabilises the training of the model and allows one to train it on large ASR datasets without any problem.

This model beat traditional LSTM/GRU models on the CommonVoice/LibriSpeech datasets (WER and efficiency).

For more info see: “Moumen, A., & Parcollet, T. (2023, June). Stabilising and accelerating light gated recurrent units for automatic speech recognition. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.” (https://arxiv.org/abs/2302.10144)

To improve the speed of the model, it is recommended to use the torch just-in-time compiler (jit) right before using it or you can use the custom implementation (CUDA+PyTorch) that is available at https://github.com/Adel-Moumen/fast_ligru.

You can compile it with: compiled_model = torch.jit.script(model)

It accepts in input tensors formatted as (batch, time, fea). In the case of 4d inputs like (batch, time, fea, channel) the tensor is flattened as (batch, time, fea*channel).

Parameters:
  • hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output). values (i.e, time and frequency kernel sizes respectively).

  • input_shape (tuple) – The shape of an example input.

  • nonlinearity (str) – Type of nonlinearity (tanh, relu).

  • ff_normalization (str) – Type of feedforward normalization for the ligru model (batchnorm, layernorm). Every string different from batchnorm and layernorm will result in no normalization.

  • recurrent_elementwise_affine (bool) – A boolean value that when set to True will enable the learnable affine parameters.

  • num_layers (int) – Number of layers to employ in the RNN architecture.

  • bias (bool) – If True, the additive bias b is adopted.

  • dropout (float) – It is the dropout factor (must be between 0 and 1).

  • re_init (bool) – If True, orthogonal initialization is used for the recurrent weights. Xavier initialization is used for the input connection weights.

  • bidirectional (bool) – If True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used.

Example

>>> inp_tensor = torch.rand([4, 10, 20])
>>> net = SLiGRU(input_shape=inp_tensor.shape, hidden_size=5)
>>> out_tensor, _ = net(inp_tensor)
>>>
torch.Size([4, 10, 5])
forward(x, hx: Tensor | None = None)[source]

Returns the output of the SLi-GRU.

Parameters:
training: bool
class speechbrain.nnet.RNN.SLiGRU_Layer(input_size, hidden_size, num_layers, batch_size, dropout=0.0, nonlinearity='relu', ff_normalization='batchnorm', recurrent_elementwise_affine=False, bias=True, bidirectional=False)[source]

Bases: Module

This class implements a Stabilised Light-Gated Recurrent Units (SLi-GRU) layer.

Parameters:
  • input_size (int) – Feature dimensionality of the input tensors.

  • hidden_size (int) – Number of output neurons.

  • num_layers (int) – The layer number.

  • batch_size (int) – Batch size of the input tensors.

  • dropout (float) – It is the dropout factor (must be between 0 and 1).

  • nonlinearity (str) – Type of nonlinearity (tanh, sin, leaky_relu, relu).

  • ff_normalization (str) – Type of normalization (batchnorm, layernorm). Every string different from batchnorm and layernorm will result in layer normalization. Note that this only applies to the feedforward affine transform. SLi-GRU (unlike Li-GRU) unconditionally applies layer normalization in the recurrent layers, which is unaffected by this parameter.

  • recurrent_elementwise_affine (bool) – A boolean value that when set to True will enable the learnable affine parameters.

  • bias (bool) – If True, the additive bias b is adopted.

  • bidirectional (bool) – if True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used.

forward(x: Tensor, hx: Tensor | None = None) Tensor[source]

Returns the output of the liGRU layer.

Parameters:
training: bool
class speechbrain.nnet.RNN.QuasiRNNLayer(input_size, hidden_size, bidirectional, zoneout=0.0, output_gate=True)[source]

Bases: Module

Applies a single layer Quasi-Recurrent Neural Network (QRNN) to an input sequence.

Parameters:
  • input_size (int) – The number of expected features in the input x.

  • hidden_size (int) – The number of features in the hidden state h. If not specified, the input size is used.

  • zoneout (float) – Whether to apply zoneout (i.e. failing to update elements in the hidden state) to the hidden state updates. Default: 0.

  • output_gate (bool) – If True, performs QRNN-fo (applying an output gate to the output). If False, performs QRNN-f. Default: True.

Example

>>> import torch
>>> model = QuasiRNNLayer(60, 256, bidirectional=True)
>>> a = torch.rand([10, 120, 60])
>>> b = model(a)
>>> b[0].shape
torch.Size([10, 120, 512])
training: bool
forgetMult(f: Tensor, x: Tensor, hidden: Tensor | None) Tensor[source]

Returns the hidden states for each time step.

Parameters:

wx (torch.Tensor) – Linearly transformed input.

split_gate_inputs(y: Tensor) Tuple[Tensor, Tensor, Tensor | None][source]

Splits the input gates.

forward(x: Tensor, hidden: Tensor | None = None) Tuple[Tensor, Tensor][source]

Returns the output of the QRNN layer.

Parameters:

x (torch.Tensor) – Input to transform linearly.

class speechbrain.nnet.RNN.QuasiRNN(hidden_size, input_shape=None, input_size=None, num_layers=1, bias=True, batch_first=False, dropout=0, bidirectional=False, **kwargs)[source]

Bases: Module

This is a implementation for the Quasi-RNN.

https://arxiv.org/pdf/1611.01576.pdf

Part of the code is adapted from: https://github.com/salesforce/pytorch-qrnn

Parameters:
  • hidden_size (int) – The number of features in the hidden state h. If not specified, the input size is used.

  • input_shape (tuple) – The shape of an example input. Alternatively, use input_size.

  • input_size (int) – The size of the input. Alternatively, use input_shape.

  • num_layers (int) – The number of QRNN layers to produce.

  • zoneout (bool) – Whether to apply zoneout (i.e. failing to update elements in the hidden state) to the hidden state updates. Default: 0.

  • output_gate (bool) – If True, performs QRNN-fo (applying an output gate to the output). If False, performs QRNN-f. Default: True.

Example

>>> a = torch.rand([8, 120, 40])
>>> model = QuasiRNN(
...     256, num_layers=4, input_shape=a.shape, bidirectional=True
... )
>>> b, _ = model(a)
>>> b.shape
torch.Size([8, 120, 512])
training: bool
forward(x, hidden=None)[source]

Applies the QuasiRNN to the input tensor x.

speechbrain.nnet.RNN.rnn_init(module)[source]

This function is used to initialize the RNN weight. Recurrent connection: orthogonal initialization.

Parameters:

module (torch.nn.Module) – Recurrent neural network module.

Example

>>> inp_tensor = torch.rand([4, 10, 20])
>>> net = RNN(hidden_size=5, input_shape=inp_tensor.shape)
>>> out_tensor = net(inp_tensor)
>>> rnn_init(net)