speechbrain.lobes.models.Xvector module

A popular speaker recognition and diarization model.

Authors
  • Nauman Dawalatabad 2020

  • Mirco Ravanelli 2020

Summary

Classes:

Classifier

This class implements the last MLP on the top of xvector features.

Discriminator

This class implements a discriminator on the top of xvector features.

Xvector

This model extracts X-vectors for speaker recognition and diarization.

Reference

class speechbrain.lobes.models.Xvector.Xvector(device='cpu', activation=<class 'torch.nn.modules.activation.LeakyReLU'>, tdnn_blocks=5, tdnn_channels=[512, 512, 512, 512, 1500], tdnn_kernel_sizes=[5, 3, 3, 1, 1], tdnn_dilations=[1, 2, 3, 1, 1], lin_neurons=512, in_channels=40)[source]

Bases: torch.nn.modules.module.Module

This model extracts X-vectors for speaker recognition and diarization.

Parameters
  • device (str) – Device used e.g. “cpu” or “cuda”.

  • activation (torch class) – A class for constructing the activation layers.

  • tdnn_blocks (int) – Number of time-delay neural (TDNN) layers.

  • tdnn_channels (list of ints) – Output channels for TDNN layer.

  • tdnn_kernel_sizes (list of ints) – List of kernel sizes for each TDNN layer.

  • tdnn_dilations (list of ints) – List of dilations for kernels in each TDNN layer.

  • lin_neurons (int) – Number of neurons in linear layers.

Example

>>> compute_xvect = Xvector('cpu')
>>> input_feats = torch.rand([5, 10, 40])
>>> outputs = compute_xvect(input_feats)
>>> outputs.shape
torch.Size([5, 1, 512])
forward(x, lens=None)[source]

Returns the x-vectors.

Parameters

x (torch.Tensor) –

training: bool
class speechbrain.lobes.models.Xvector.Classifier(input_shape, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, lin_blocks=1, lin_neurons=512, out_neurons=1211)[source]

Bases: speechbrain.nnet.containers.Sequential

This class implements the last MLP on the top of xvector features.

Parameters
  • input_shape (tuple) – Expected shape of an example input.

  • activation (torch class) – A class for constructing the activation layers.

  • lin_blocks (int) – Number of linear layers.

  • lin_neurons (int) – Number of neurons in linear layers.

  • out_neurons (int) – Number of output neurons.

Example

>>> input_feats = torch.rand([5, 10, 40])
>>> compute_xvect = Xvector()
>>> xvects = compute_xvect(input_feats)
>>> classify = Classifier(input_shape=xvects.shape)
>>> output = classify(xvects)
>>> output.shape
torch.Size([5, 1, 1211])
class speechbrain.lobes.models.Xvector.Discriminator(input_shape, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, lin_blocks=1, lin_neurons=512, out_neurons=1)[source]

Bases: speechbrain.nnet.containers.Sequential

This class implements a discriminator on the top of xvector features.

Parameters
  • device (str) – Device used e.g. “cpu” or “cuda”

  • activation (torch class) – A class for constructing the activation layers.

  • lin_blocks (int) – Number of linear layers.

  • lin_neurons (int) – Number of neurons in linear layers.

Example

>>> input_feats = torch.rand([5, 10, 40])
>>> compute_xvect = Xvector()
>>> xvects = compute_xvect(input_feats)
>>> discriminate = Discriminator(xvects.shape)
>>> output = discriminate(xvects)
>>> output.shape
torch.Size([5, 1, 1])