speechbrain.lobes.models.Cnn14 module

This file implements the CNN14 model from https://arxiv.org/abs/1912.10211

Authors * Cem Subakan 2022 * Francesco Paissan 2022

Summary

Classes:

`Cnn14`	This class implements the Cnn14 model from https://arxiv.org/abs/1912.10211
`ConvBlock`	This class implements the convolutional block used in CNN14

Functions:

`init_bn`	Initialize a Batchnorm layer.
`init_layer`	Initialize a Linear or Convolutional layer.

speechbrain.lobes.models.Cnn14.init_layer(layer)[source]: Initialize a Linear or Convolutional layer.

speechbrain.lobes.models.Cnn14.init_bn(bn)[source]: Initialize a Batchnorm layer.

class speechbrain.lobes.models.Cnn14.ConvBlock(in_channels, out_channels, norm_type)[source]

This class implements the convolutional block used in CNN14

Parameters:

init_weight()[source]: Initializes the model convolutional layers and the batchnorm layers

forward(x, pool_size=(2, 2), pool_type='avg')[source]

The forward pass for convblocks in CNN14

xtorch.Tensor: input tensor with shape B x C_in x D1 x D2
where B = Batchsize: C_in = Number of input channel D1 = Dimensionality of the first spatial dim D2 = Dimensionality of the second spatial dim
pool_sizetuple with integer values: Amount of pooling at each layer
pool_typestr in [‘max’, ‘avg’, ‘avg+max’]: The type of pooling

class speechbrain.lobes.models.Cnn14.Cnn14(mel_bins, emb_dim, norm_type='bn', return_reps=False)[source]

This class implements the Cnn14 model from https://arxiv.org/abs/1912.10211

Parameters:

mel_bins (int) – Number of mel frequency bins in the input
emb_dim (int) – The dimensionality of the output embeddings
norm_type (str in ['bn', 'in', 'ln']) – The type of normalization
return_reps (bool (default=False)) – If True the model returns intermediate representations as well for interpretation
Example –
-------- –
Cnn14(120 (>>> cnn14 =) –
256) –
torch.rand(3 (>>> x =) –
400 –
120) –
cnn14.forward(x) (>>> h =) –
print(h.shape) (>>>) –
torch.Size([3 –
1 –
256]) –

The forward pass for the CNN14 encoder

xtorch.Tensor: input tensor with shape B x C_in x D1 x D2
where B = Batchsize: C_in = Number of input channel D1 = Dimensionality of the first spatial dim D2 = Dimensionality of the second spatial dim