speechbrain.nnet.RNN module

Library implementing recurrent neural networks.

Authors

Adel Moumen 2023
Mirco Ravanelli 2020
Ju-Chieh Chou 2020
Jianyuan Zhong 2020
Loren Lugosch 2020

Summary

Classes:

`AttentionalRNNDecoder`	This function implements RNN decoder model with attention.
`GRU`	This function implements a basic GRU.
`GRUCell`	This class implements a basic GRU Cell for a timestep of input, while GRU() takes the whole sequence as input.
`LSTM`	This function implements a basic LSTM.
`LSTMCell`	This class implements a basic LSTM Cell for a timestep of input, while LSTM() takes the whole sequence as input.
`LiGRU`	This function implements a Light GRU (Li-GRU).
`LiGRU_Layer`	This class implements Light-Gated Recurrent Units (Li-GRU) layer.
`QuasiRNN`	This is a implementation for the Quasi-RNN.
`QuasiRNNLayer`	Applies a single layer Quasi-Recurrent Neural Network (QRNN) to an input sequence.
`RNN`	This function implements a vanilla RNN.
`RNNCell`	This class implements a basic RNN Cell for a timestep of input, while RNN() takes the whole sequence as input.
`SLiGRU`	This class implements a Stabilised Light GRU (SLi-GRU).
`SLiGRU_Layer`	This class implements a Stabilised Light-Gated Recurrent Units (SLi-GRU) layer.

Functions:

`pack_padded_sequence`	Returns packed speechbrain-formatted tensors.
`pad_packed_sequence`	Returns speechbrain-formatted tensor from packed sequences.
`rnn_init`	This function is used to initialize the RNN weight.

Reference

speechbrain.nnet.RNN.pack_padded_sequence(inputs, lengths)[source]

Returns packed speechbrain-formatted tensors.

Parameters:

inputs (torch.Tensor) – The sequences to pack.
lengths (torch.Tensor) – The length of each sequence.

Return type:

The packed sequences.

speechbrain.nnet.RNN.pad_packed_sequence(inputs)[source]

Returns speechbrain-formatted tensor from packed sequences.

Parameters:: inputs (torch.nn.utils.rnn.PackedSequence) – An input set of sequences to convert to a tensor.
Returns:: outputs – The padded sequences.
Return type:: torch.Tensor

class speechbrain.nnet.RNN.RNN(hidden_size, input_shape=None, input_size=None, nonlinearity='relu', num_layers=1, bias=True, dropout=0.0, re_init=True, bidirectional=False)[source]

Bases: Module

This function implements a vanilla RNN.

It accepts in input tensors formatted as (batch, time, fea). In the case of 4d inputs like (batch, time, fea, channel) the tensor is flattened as (batch, time, fea*channel).

Parameters:

hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output). values (i.e, time and frequency kernel sizes respectively).
input_shape (tuple) – The shape of an example input. Alternatively, use input_size.
input_size (int) – The size of the input. Alternatively, use input_shape.
nonlinearity (str) – Type of nonlinearity (tanh, relu).
num_layers (int) – Number of layers to employ in the RNN architecture.
bias (bool) – If True, the additive bias b is adopted.
dropout (float) – It is the dropout factor (must be between 0 and 1).
re_init (bool) – If True, orthogonal initialization is used for the recurrent weights. Xavier initialization is used for the input connection weights.
bidirectional (bool) – If True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used.

Example

>>> inp_tensor = torch.rand([4, 10, 20])
>>> net = RNN(hidden_size=5, input_shape=inp_tensor.shape)
>>> out_tensor, _ = net(inp_tensor)
>>>
torch.Size([4, 10, 5])

forward(x, hx=None, lengths=None)[source]

Returns the output of the vanilla RNN.

Parameters:

x (torch.Tensor) – Input tensor.
hx (torch.Tensor) – Starting hidden state.
lengths (torch.Tensor) – Relative lengths of the input signals.

Returns:

output (torch.Tensor) – The output of the vanilla RNN
hn (torch.Tensor) – The hidden states.

class speechbrain.nnet.RNN.LSTM(hidden_size, input_shape=None, input_size=None, num_layers=1, bias=True, dropout=0.0, re_init=True, bidirectional=False)[source]

Bases: Module

This function implements a basic LSTM.

It accepts in input tensors formatted as (batch, time, fea). In the case of 4d inputs like (batch, time, fea, channel) the tensor is flattened as (batch, time, fea*channel).

Parameters:

hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output). values (i.e, time and frequency kernel sizes respectively).
input_shape (tuple) – The shape of an example input. Alternatively, use input_size.
input_size (int) – The size of the input. Alternatively, use input_shape.
num_layers (int) – Number of layers to employ in the RNN architecture.
bias (bool) – If True, the additive bias b is adopted.
dropout (float) – It is the dropout factor (must be between 0 and 1).
re_init (bool) – It True, orthogonal initialization is used for the recurrent weights. Xavier initialization is used for the input connection weights.
bidirectional (bool) – If True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used.

Example

>>> inp_tensor = torch.rand([4, 10, 20])
>>> net = LSTM(hidden_size=5, input_shape=inp_tensor.shape)
>>> out_tensor = net(inp_tensor)
>>>
torch.Size([4, 10, 5])

forward(x, hx=None, lengths=None)[source]

Returns the output of the LSTM.

Parameters:

x (torch.Tensor) – Input tensor.
hx (torch.Tensor) – Starting hidden state.
lengths (torch.Tensor) – Relative length of the input signals.

Returns:

output (torch.Tensor) – The output of the LSTM.
hn (torch.Tensor) – The hidden states.

class speechbrain.nnet.RNN.GRU(hidden_size, input_shape=None, input_size=None, num_layers=1, bias=True, dropout=0.0, re_init=True, bidirectional=False)[source]

Bases: Module

This function implements a basic GRU.

It accepts input tensors formatted as (batch, time, fea). In the case of 4d inputs like (batch, time, fea, channel) the tensor is flattened as (batch, time, fea*channel).

Parameters:

hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output). values (i.e, time and frequency kernel sizes respectively).
input_shape (tuple) – The shape of an example input. Alternatively, use input_size.
input_size (int) – The size of the input. Alternatively, use input_shape.
num_layers (int) – Number of layers to employ in the RNN architecture.
bias (bool) – If True, the additive bias b is adopted.
dropout (float) – It is the dropout factor (must be between 0 and 1).
re_init (bool) – If True, orthogonal initialization is used for the recurrent weights. Xavier initialization is used for the input connection weights.
bidirectional (bool) – If True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used.

Example

>>> inp_tensor = torch.rand([4, 10, 20])
>>> net = GRU(hidden_size=5, input_shape=inp_tensor.shape)
>>> out_tensor, _ = net(inp_tensor)
>>>
torch.Size([4, 10, 5])

forward(x, hx=None, lengths=None)[source]

Returns the output of the GRU.

Parameters:

x (torch.Tensor) – Input tensor.
hx (torch.Tensor) – Starting hidden state.
lengths (torch.Tensor) – Relative length of the input signals.

Returns:

output (torch.Tensor) – Output of GRU.
hn (torch.Tensor) – Hidden states.

class speechbrain.nnet.RNN.RNNCell(hidden_size, input_shape=None, input_size=None, num_layers=1, bias=True, dropout=0.0, re_init=True, nonlinearity='tanh')[source]

Bases: Module

This class implements a basic RNN Cell for a timestep of input, while RNN() takes the whole sequence as input.

It is designed for an autoregressive decoder (ex. attentional decoder), which takes one input at a time. Using torch.nn.RNNCell() instead of torch.nn.RNN() to reduce VRAM consumption.

It accepts in input tensors formatted as (batch, fea).

Parameters:

hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output).
input_shape (tuple) – The shape of an example input. Alternatively, use input_size.
input_size (int) – The size of the input. Alternatively, use input_shape.
num_layers (int) – Number of layers to employ in the RNN architecture.
bias (bool) – If True, the additive bias b is adopted.
dropout (float) – It is the dropout factor (must be between 0 and 1).
re_init (bool) – It True, orthogonal initialization is used for the recurrent weights. Xavier initialization is used for the input connection weights.
nonlinearity (str) – Type of nonlinearity (tanh, relu).

Example

>>> inp_tensor = torch.rand([4, 20])
>>> net = RNNCell(hidden_size=5, input_shape=inp_tensor.shape)
>>> out_tensor, _ = net(inp_tensor)
>>> out_tensor.shape
torch.Size([4, 5])

forward(x, hx=None)[source]

Returns the output of the RNNCell.

Parameters:

x (torch.Tensor) – The input of RNNCell.
hx (torch.Tensor) – The hidden states of RNNCell.

Returns:

h (torch.Tensor) – Outputs of RNNCell.
hidden (torch.Tensor) – Hidden states.

class speechbrain.nnet.RNN.GRUCell(hidden_size, input_shape=None, input_size=None, num_layers=1, bias=True, dropout=0.0, re_init=True)[source]

Bases: Module

This class implements a basic GRU Cell for a timestep of input, while GRU() takes the whole sequence as input.

It is designed for an autoregressive decoder (ex. attentional decoder), which takes one input at a time. Using torch.nn.GRUCell() instead of torch.nn.GRU() to reduce VRAM consumption. It accepts in input tensors formatted as (batch, fea).

Parameters:

hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output).
input_shape (tuple) – The shape of an example input. Alternatively, use input_size.
input_size (int) – The size of the input. Alternatively, use input_shape.
num_layers (int) – Number of layers to employ in the GRU architecture.
bias (bool) – If True, the additive bias b is adopted.
dropout (float) – It is the dropout factor (must be between 0 and 1).
re_init (bool) – It True, orthogonal initialization is used for the recurrent weights. Xavier initialization is used for the input connection weights.

Example

>>> inp_tensor = torch.rand([4, 20])
>>> net = GRUCell(hidden_size=5, input_shape=inp_tensor.shape)
>>> out_tensor, _ = net(inp_tensor)
>>> out_tensor.shape
torch.Size([4, 5])

forward(x, hx=None)[source]

Returns the output of the GRUCell.

Parameters:

x (torch.Tensor) – The input of GRUCell.
hx (torch.Tensor) – The hidden states of GRUCell.

Returns:

h (torch.Tensor) – Outputs of GRUCell
hidden (torch.Tensor) – Hidden states.

class speechbrain.nnet.RNN.LSTMCell(hidden_size, input_shape=None, input_size=None, num_layers=1, bias=True, dropout=0.0, re_init=True)[source]

Bases: Module

This class implements a basic LSTM Cell for a timestep of input, while LSTM() takes the whole sequence as input.

It is designed for an autoregressive decoder (ex. attentional decoder), which takes one input at a time. Using torch.nn.LSTMCell() instead of torch.nn.LSTM() to reduce VRAM consumption. It accepts in input tensors formatted as (batch, fea).

Parameters:

hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output).
input_shape (tuple) – The shape of an example input. Alternatively, use input_size.
input_size (int) – The size of the input. Alternatively, use input_shape.
num_layers (int) – Number of layers to employ in the LSTM architecture.
bias (bool) – If True, the additive bias b is adopted.
dropout (float) – It is the dropout factor (must be between 0 and 1).
re_init (bool) – If True, orthogonal initialization is used for the recurrent weights. Xavier initialization is used for the input connection weights.

Example

>>> inp_tensor = torch.rand([4, 20])
>>> net = LSTMCell(hidden_size=5, input_shape=inp_tensor.shape)
>>> out_tensor, _ = net(inp_tensor)
>>> out_tensor.shape
torch.Size([4, 5])

forward(x, hx=None)[source]

Returns the output of the LSTMCell.

Parameters:

x (torch.Tensor) – The input of LSTMCell.
hx (torch.Tensor) – The hidden states of LSTMCell.

Returns:

h (torch.Tensor) – Outputs
Tuple of (hidden, cell)

class speechbrain.nnet.RNN.AttentionalRNNDecoder(rnn_type, attn_type, hidden_size, attn_dim, num_layers, enc_dim, input_size, nonlinearity='relu', re_init=True, normalization='batchnorm', scaling=1.0, channels=None, kernel_size=None, bias=True, dropout=0.0)[source]

Bases: Module

This function implements RNN decoder model with attention.

This function implements different RNN models. It accepts in enc_states tensors formatted as (batch, time, fea). In the case of 4d inputs like (batch, time, fea, channel) the tensor is flattened in this way: (batch, time, fea*channel).

Parameters:

rnn_type (str) – Type of recurrent neural network to use (rnn, lstm, gru).
attn_type (str) – type of attention to use (location, content).
hidden_size (int) – Number of the neurons.
attn_dim (int) – Number of attention module internal and output neurons.
num_layers (int) – Number of layers to employ in the RNN architecture.
enc_dim (int) – Size of encoding dimension.
input_size (int) – Expected size of the relevant input dimension.
nonlinearity (str) – Type of nonlinearity (tanh, relu). This option is active for rnn and ligru models only. For lstm and gru tanh is used.
re_init (bool) – It True, orthogonal init is used for the recurrent weights. Xavier initialization is used for the input connection weights.
normalization (str) – Type of normalization for the ligru model (batchnorm, layernorm). Every string different from batchnorm and layernorm will result in no normalization.
scaling (float) – A scaling factor to sharpen or smoothen the attention distribution.
channels (int) – Number of channels for location-aware attention.
kernel_size (int) – Size of the kernel for location-aware attention.
bias (bool) – If True, the additive bias b is adopted.
dropout (float) – It is the dropout factor (must be between 0 and 1).

Example

>>> batch_size = 4
>>> enc_states = torch.rand([batch_size, 10, 20])
>>> wav_len = torch.ones([batch_size])
>>> inp_tensor = torch.rand([batch_size, 5, 6])
>>> net = AttentionalRNNDecoder(
...     rnn_type="lstm",
...     attn_type="content",
...     hidden_size=7,
...     attn_dim=5,
...     num_layers=1,
...     enc_dim=20,
...     input_size=6,
... )
>>> out_tensor, attn = net(inp_tensor, enc_states, wav_len)
>>> out_tensor.shape
torch.Size([4, 5, 7])

forward_step(inp, hs, c, enc_states, enc_len)[source]

One step of forward pass process.

Parameters:

inp (torch.Tensor) – The input of current timestep.
hs (torch.Tensor or tuple of torch.Tensor) – The cell state for RNN.
c (torch.Tensor) – The context vector of previous timestep.
enc_states (torch.Tensor) – The tensor generated by encoder, to be attended.
enc_len (torch.LongTensor) – The actual length of encoder states.

Returns:

dec_out (torch.Tensor) – The output tensor.
hs (torch.Tensor or tuple of torch.Tensor) – The new cell state for RNN.
c (torch.Tensor) – The context vector of the current timestep.
w (torch.Tensor) – The weight of attention.

forward(inp_tensor, enc_states, wav_len)[source]

This method implements the forward pass of the attentional RNN decoder.

Parameters:

inp_tensor (torch.Tensor) – The input tensor for each timesteps of RNN decoder.
enc_states (torch.Tensor) – The tensor to be attended by the decoder.
wav_len (torch.Tensor) – This variable stores the relative length of wavform.

Returns:

outputs (torch.Tensor) – The output of the RNN decoder.
attn (torch.Tensor) – The attention weight of each timestep.

class speechbrain.nnet.RNN.LiGRU(hidden_size, input_shape, nonlinearity='relu', normalization='batchnorm', num_layers=1, bias=True, dropout=0.0, re_init=True, bidirectional=False)[source]

Bases: Module

This function implements a Light GRU (Li-GRU).

Li-GRU is single-gate GRU model based on batch-norm + relu activations + recurrent dropout. For more info see:

“M. Ravanelli, P. Brakel, M. Omologo, Y. Bengio, Light Gated Recurrent Units for Speech Recognition, in IEEE Transactions on Emerging Topics in Computational Intelligence, 2018” (https://arxiv.org/abs/1803.10225)

If you face instabilities during training, instead use the Stabilised Li-GRU (SLi-GRU). See:

speechbrain.nnet.RNN.SLiGRU

To improve the speed of the model, it is recommended to use the torch just-in-time compiler (jit) right before using it or you can use the custom implementation (CUDA+PyTorch) that is available at https://github.com/Adel-Moumen/fast_ligru.

You can compile it with: compiled_model = torch.jit.script(model)

It accepts in input tensors formatted as (batch, time, fea). In the case of 4d inputs like (batch, time, fea, channel) the tensor is flattened as (batch, time, fea*channel).

Parameters:

hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output). values (i.e, time and frequency kernel sizes respectively).
input_shape (tuple) – The shape of an example input.
nonlinearity (str) – Type of nonlinearity (tanh, relu).
normalization (str) – Type of normalization for the ligru model (batchnorm, layernorm). Every string different from batchnorm and layernorm will result in no normalization.
num_layers (int) – Number of layers to employ in the RNN architecture.
bias (bool) – If True, the additive bias b is adopted.
dropout (float) – It is the dropout factor (must be between 0 and 1).
re_init (bool) – If True, orthogonal initialization is used for the recurrent weights. Xavier initialization is used for the input connection weights.
bidirectional (bool) – If True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used.

Example

>>> inp_tensor = torch.rand([4, 10, 20])
>>> net = LiGRU(input_shape=inp_tensor.shape, hidden_size=5)
>>> out_tensor, _ = net(inp_tensor)
>>>
torch.Size([4, 10, 5])

forward(x, hx: Tensor | None = None)[source]

Returns the output of the Li-GRU.

Parameters:

x (torch.Tensor) – The input tensor.
hx (torch.Tensor) – Starting hidden state.

Returns:

output (torch.Tensor) – Output of LiGRU
hh (torch.Tensor) – Hidden states

class speechbrain.nnet.RNN.LiGRU_Layer(input_size, hidden_size, num_layers, batch_size, dropout=0.0, nonlinearity='relu', normalization='batchnorm', bias=True, bidirectional=False)[source]

Bases: Module

This class implements Light-Gated Recurrent Units (Li-GRU) layer.

Parameters:

input_size (int) – Feature dimensionality of the input tensors.
hidden_size (int) – Number of output neurons.
num_layers (int) – The layer number.
batch_size (int) – Batch size of the input tensors.
dropout (float) – It is the dropout factor (must be between 0 and 1).
nonlinearity (str) – Type of nonlinearity (tanh, sin, leaky_relu, relu).
normalization (str) – Type of normalization (batchnorm, layernorm). Every string different from batchnorm and layernorm will result in layer normalization.
bias (bool) – If True, the additive bias b is adopted.
bidirectional (bool) – if True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used.

forward(x: Tensor, hx: Tensor | None = None) → Tensor[source]

Returns the output of the liGRU layer.

Parameters:

x (torch.Tensor) – Input tensor.
hx (torch.Tensor) – Hidden state.

Returns:

h – The output of the liGRU.

Return type:

torch.Tensor

class speechbrain.nnet.RNN.SLiGRU(hidden_size, input_shape, nonlinearity='relu', ff_normalization='batchnorm', recurrent_elementwise_affine=False, num_layers=1, bias=True, dropout=0.0, re_init=True, bidirectional=False)[source]

Bases: Module

This class implements a Stabilised Light GRU (SLi-GRU).

SLi-GRU is single-gate GRU model based on batch-norm + relu activations + layer-norm on the recurrent connections + recurrent dropout.

The SLi-GRU differs from the vanilla Li-GRU on the recurrent weights. Indeed, the Li-GRU suffers from an exploding gradient problem on the recurrent weights, and cannot be trained on medium to large ASR dataset. To solve this problem, we use a layer-norm on the recurrent weights that stabilises the training of the model and allows one to train it on large ASR datasets without any problem.

This model beat traditional LSTM/GRU models on the CommonVoice/LibriSpeech datasets (WER and efficiency).

For more info see: “Moumen, A., & Parcollet, T. (2023, June). Stabilising and accelerating light gated recurrent units for automatic speech recognition. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.” (https://arxiv.org/abs/2302.10144)

To improve the speed of the model, it is recommended to use the torch just-in-time compiler (jit) right before using it or you can use the custom implementation (CUDA+PyTorch) that is available at https://github.com/Adel-Moumen/fast_ligru.

You can compile it with: compiled_model = torch.jit.script(model)

It accepts in input tensors formatted as (batch, time, fea). In the case of 4d inputs like (batch, time, fea, channel) the tensor is flattened as (batch, time, fea*channel).

Parameters:

hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output). values (i.e, time and frequency kernel sizes respectively).
input_shape (tuple) – The shape of an example input.
nonlinearity (str) – Type of nonlinearity (tanh, relu).
ff_normalization (str) – Type of feedforward normalization for the ligru model (batchnorm, layernorm). Every string different from batchnorm and layernorm will result in no normalization.
recurrent_elementwise_affine (bool) – A boolean value that when set to True will enable the learnable affine parameters.
num_layers (int) – Number of layers to employ in the RNN architecture.
bias (bool) – If True, the additive bias b is adopted.
dropout (float) – It is the dropout factor (must be between 0 and 1).
re_init (bool) – If True, orthogonal initialization is used for the recurrent weights. Xavier initialization is used for the input connection weights.
bidirectional (bool) – If True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used.

Example

>>> inp_tensor = torch.rand([4, 10, 20])
>>> net = SLiGRU(input_shape=inp_tensor.shape, hidden_size=5)
>>> out_tensor, _ = net(inp_tensor)
>>>
torch.Size([4, 10, 5])

forward(x, hx: Tensor | None = None)[source]

Returns the output of the SLi-GRU.

Parameters:

x (torch.Tensor) – The input tensor.
hx (torch.Tensor) – Starting hidden state.

Returns:

output (torch.Tensor) – Output of SLiGRU
hh (torch.Tensor) – Hidden states

class speechbrain.nnet.RNN.SLiGRU_Layer(input_size, hidden_size, num_layers, batch_size, dropout=0.0, nonlinearity='relu', ff_normalization='batchnorm', recurrent_elementwise_affine=False, bias=True, bidirectional=False)[source]

Bases: Module

This class implements a Stabilised Light-Gated Recurrent Units (SLi-GRU) layer.

Parameters:

input_size (int) – Feature dimensionality of the input tensors.
hidden_size (int) – Number of output neurons.
num_layers (int) – The layer number.
batch_size (int) – Batch size of the input tensors.
dropout (float) – It is the dropout factor (must be between 0 and 1).
nonlinearity (str) – Type of nonlinearity (tanh, sin, leaky_relu, relu).
ff_normalization (str) – Type of normalization (batchnorm, layernorm). Every string different from batchnorm and layernorm will result in layer normalization. Note that this only applies to the feedforward affine transform. SLi-GRU (unlike Li-GRU) unconditionally applies layer normalization in the recurrent layers, which is unaffected by this parameter.
recurrent_elementwise_affine (bool) – A boolean value that when set to True will enable the learnable affine parameters.
bias (bool) – If True, the additive bias b is adopted.
bidirectional (bool) – if True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used.

forward(x: Tensor, hx: Tensor | None = None) → Tensor[source]

Returns the output of the liGRU layer.

Parameters:

x (torch.Tensor) – Input tensor.
hx (torch.Tensor) – Hidden state.

Returns:

h – The output of liGRU.

Return type:

torch.Tensor

class speechbrain.nnet.RNN.QuasiRNNLayer(input_size, hidden_size, bidirectional, zoneout=0.0, output_gate=True)[source]

Bases: Module

Applies a single layer Quasi-Recurrent Neural Network (QRNN) to an input sequence.

Parameters:

input_size (int) – The number of expected features in the input x.
hidden_size (int) – The number of features in the hidden state h. If not specified, the input size is used.
bidirectional (bool) – Whether to apply the RNN in both forward and backward directions.
zoneout (float) – Whether to apply zoneout (i.e. failing to update elements in the hidden state) to the hidden state updates. Default: 0.
output_gate (bool) – If True, performs QRNN-fo (applying an output gate to the output). If False, performs QRNN-f. Default: True.

Example

>>> import torch
>>> model = QuasiRNNLayer(60, 256, bidirectional=True)
>>> a = torch.rand([10, 120, 60])
>>> b = model(a)
>>> b[0].shape
torch.Size([10, 120, 512])

forgetMult(f: Tensor, x: Tensor, hidden: Tensor | None) → Tensor[source]

Returns the hidden states for each time step.

Parameters:

f (torch.Tensor)
x (torch.Tensor) – Input tensors
hidden (torch.Tensor) – First hidden state if any.

Return type:

Hidden states for each step.

split_gate_inputs(y: Tensor) → tuple[Tensor, Tensor, Tensor | None][source]: Splits the input gates.

forward(x: Tensor, hidden: Tensor | None = None) → tuple[Tensor, Tensor][source]

Returns the output of the QRNN layer.

Parameters:

x (torch.Tensor) – Input to transform linearly.
hidden (torch.Tensor) – Initial hidden state, if any.

Returns:

h (torch.Tensor)
c (torch.Tensor)

class speechbrain.nnet.RNN.QuasiRNN(hidden_size, input_shape=None, input_size=None, num_layers=1, bias=True, dropout=0, bidirectional=False, **kwargs)[source]

Bases: Module

This is a implementation for the Quasi-RNN.

https://arxiv.org/pdf/1611.01576.pdf

Part of the code is adapted from: https://github.com/salesforce/pytorch-qrnn

Parameters:

hidden_size (int) – The number of features in the hidden state h. If not specified, the input size is used.
input_shape (tuple) – The shape of an example input. Alternatively, use input_size.
input_size (int) – The size of the input. Alternatively, use input_shape.
num_layers (int) – The number of QRNN layers to produce.
bias (bool) – Whether to add a bias term, only True supported.
dropout (float) – The rate at which to zero out outputs.
bidirectional (bool) – If true, one set of parameters will traverse forward, and the other set will traverse from end to start.
**kwargs (dict) – Arguments to forward to QuasiRNN layers.

Example

>>> a = torch.rand([8, 120, 40])
>>> model = QuasiRNN(
...     256, num_layers=4, input_shape=a.shape, bidirectional=True
... )
>>> b, _ = model(a)
>>> b.shape
torch.Size([8, 120, 512])

forward(x, hidden=None)[source]: Applies the QuasiRNN to the input tensor x.

speechbrain.nnet.RNN.rnn_init(module)[source]

This function is used to initialize the RNN weight. Recurrent connection: orthogonal initialization.

Parameters:: module (torch.nn.Module) – Recurrent neural network module.

Example

>>> inp_tensor = torch.rand([4, 10, 20])
>>> net = RNN(hidden_size=5, input_shape=inp_tensor.shape)
>>> out_tensor = net(inp_tensor)
>>> rnn_init(net)