speechbrain.lobes.models.ResNet module

ResNet PreActived for speaker verification

Authors
  • Mickael Rouvier 2022

Summary

Classes:

BasicBlock

An implementation of ResNet Block.

Classifier

This class implements the cosine similarity on the top of features.

ResNet

An implementation of ResNet

SEBasicBlock

An implementation of Squeeze-and-Excitation ResNet Block.

SEBlock

An implementation of Squeeze-and-Excitation Block.

Functions:

conv1x1

2D convolution with kernel_size = 1

conv3x3

2D convolution with kernel_size = 3

Reference

speechbrain.lobes.models.ResNet.conv3x3(in_planes, out_planes, stride=1)[source]

2D convolution with kernel_size = 3

speechbrain.lobes.models.ResNet.conv1x1(in_planes, out_planes, stride=1)[source]

2D convolution with kernel_size = 1

class speechbrain.lobes.models.ResNet.SEBlock(channels, reduction=1, activation=<class 'torch.nn.modules.activation.ReLU'>)[source]

Bases: Module

An implementation of Squeeze-and-Excitation Block.

Parameters:
  • channels (int) – The number of channels.

  • reduction (int) – The reduction factor of channels.

Example

>>> inp_tensor = torch.rand([1, 64, 80, 40])
>>> se_layer = SEBlock(64)
>>> out_tensor = se_layer(inp_tensor)
>>> out_tensor.shape
torch.Size([1, 64, 80, 40])
forward(x)[source]

Intermediate step. Processes the input tensor x and returns an output tensor.

training: bool
class speechbrain.lobes.models.ResNet.BasicBlock(in_channels, out_channels, stride=1, downsample=None, activation=<class 'torch.nn.modules.activation.ReLU'>)[source]

Bases: Module

An implementation of ResNet Block.

Parameters:
  • in_channels (int) – Number of input channels.

  • out_channels (int) – The number of output channels.

  • stride (int) – Factor that reduce the spatial dimensionality

  • downsample (torch function) – A function for downsample the identity of block when stride != 1

  • activation (torch class) – A class for constructing the activation layers.

Example

>>> inp_tensor = torch.rand([1, 64, 80, 40])
>>> layer = BasicBlock(64, 64, stride=1)
>>> out_tensor = layer(inp_tensor)
>>> out_tensor.shape
torch.Size([1, 64, 80, 40])
forward(x)[source]

Intermediate step. Processes the input tensor x and returns an output tensor.

training: bool
class speechbrain.lobes.models.ResNet.SEBasicBlock(in_channels, out_channels, reduction=1, stride=1, downsample=None, activation=<class 'torch.nn.modules.activation.ReLU'>)[source]

Bases: Module

An implementation of Squeeze-and-Excitation ResNet Block.

Parameters:
  • in_channels (int) – Number of input channels.

  • out_channels (int) – The number of output channels.

  • stride (int) – Factor that reduce the spatial dimensionality

  • downsample (torch function) – A function for downsample the identity of block when stride != 1

  • activation (torch class) – A class for constructing the activation layers.

Example

>>> inp_tensor = torch.rand([1, 64, 80, 40])
>>> layer = SEBasicBlock(64, 64, stride=1)
>>> out_tensor = layer(inp_tensor)
>>> out_tensor.shape
torch.Size([1, 64, 80, 40])
forward(x)[source]

Intermediate step. Processes the input tensor x and returns an output tensor.

training: bool
class speechbrain.lobes.models.ResNet.ResNet(input_size=80, device='cpu', activation=<class 'torch.nn.modules.activation.ReLU'>, channels=[128, 128, 256, 256], block_sizes=[3, 4, 6, 3], strides=[1, 2, 2, 2], lin_neurons=256)[source]

Bases: Module

An implementation of ResNet

Parameters:
  • device (str) – Device used, e.g., “cpu” or “cuda”.

  • activation (torch class) – A class for constructing the activation layers.

  • channels (list of ints) – List of number of channels used per stage.

  • block_sizes (list of ints) – List of number of groups created per stage.

  • strides (list of ints) – List of stride per stage.

  • lin_neurons (int) – Number of neurons in linear layers.

Example

>>> input_feats = torch.rand([2, 400, 80])
>>> compute_embedding = ResNet(lin_neurons=256)
>>> outputs = compute_embedding(input_feats)
>>> outputs.shape
torch.Size([2, 256])
forward(x, lengths=None)[source]

Returns the embedding vector.

Parameters:

x (torch.Tensor) – Tensor of shape (batch, time, channel).

training: bool
class speechbrain.lobes.models.ResNet.Classifier(input_size, device='cpu', lin_blocks=0, lin_neurons=256, out_neurons=1211)[source]

Bases: Module

This class implements the cosine similarity on the top of features.

Parameters:
  • device (str) – Device used, e.g., “cpu” or “cuda”.

  • lin_blocks (int) – Number of linear layers.

  • lin_neurons (int) – Number of neurons in linear layers.

  • out_neurons (int) – Number of classes.

Example

>>> classify = Classifier(input_size=2, lin_neurons=2, out_neurons=2)
>>> outputs = torch.tensor([ [1., -1.], [-9., 1.], [0.9, 0.1], [0.1, 0.9] ])
>>> outputs = outputs.unsqueeze(1)
>>> cos = classify(outputs)
>>> (cos < -1.0).long().sum()
tensor(0)
>>> (cos > 1.0).long().sum()
tensor(0)
forward(x)[source]

Returns the output probabilities over speakers.

Parameters:

x (torch.Tensor) – Torch tensor.

training: bool