speechbrain.lobes.models.transformer.TransformerLM module
An implementation of Transformer Language model.
Authors * Jianyuan Zhong * Samuele Cornell
Summary
Classes:
This is an implementation of transformer language model. |
Reference
- class speechbrain.lobes.models.transformer.TransformerLM.TransformerLM(vocab, d_model=512, nhead=8, num_encoder_layers=12, num_decoder_layers=0, d_ffn=2048, dropout=0.1, activation=<class 'torch.nn.modules.activation.ReLU'>, positional_encoding='fixed_abs_sine', normalize_before=False, d_embedding=None, max_length=2500, causal=True, attention_type='regularMHA', decoder_use_memory=False)[source]
Bases:
TransformerInterfaceThis is an implementation of transformer language model.
The architecture is based on the paper “Attention Is All You Need”: https://arxiv.org/pdf/1706.03762.pdf
- Parameters:
vocab (int) – Embedding vocabulary size
d_model (int) – The number of expected features in the encoder/decoder inputs (default=512).
nhead (int) – The number of heads in the multiheadattention models (default=8).
num_encoder_layers (int) – The number of sub-encoder-layers in the encoder (default=12).
num_decoder_layers (int) – The number of sub-decoder-layers in the decoder (default=0).
d_ffn (int) – The dimension of the feedforward network model (default=2048).
dropout (float) – The dropout value (default=0.1).
activation (torch class) – The activation function of encoder/decoder intermediate layer, relu or gelu (default=relu).
positional_encoding (str) – Type of positional encoding, default “fixed_abs_sine”
normalize_before (bool) – Whether to normalize before each layer.
d_embedding (int) – Size of embedding, if None use d_model.
max_length (int) – Maximum sequence length, default 2500 tokens.
causal (bool) – Whether to incorporate future information in decoding, default True.
attention_type (str) – Type of attention to use, one of “regularMHA” or “RelPosMHAXL”
decoder_use_memory (bool) – whether to use the hidden state in the decoder
Example
>>> src = torch.randint(0, 720, [8, 120]) >>> net = TransformerLM(720, 512, 8, 1, 0, 1024, activation=torch.nn.GELU) >>> enc_out = net.forward(src) >>> print(enc_out.shape) torch.Size([8, 120, 720])
- forward(src)[source]
- Parameters:
src (torch.Tensor) – The sequence to the encoder (required).
- Returns:
pred – Output of the transformer.
- Return type: