speechbrain.lobes.models.Xvector module

A popular speaker recognition and diarization model.

Authors

Nauman Dawalatabad 2020
Mirco Ravanelli 2020

Summary

Classes:

`Classifier`	This class implements the last MLP on the top of xvector features.
`Discriminator`	This class implements a discriminator on the top of xvector features.
`Xvector`	This model extracts X-vectors for speaker recognition and diarization.

Reference

class speechbrain.lobes.models.Xvector.Xvector(device='cpu', activation=<class 'torch.nn.modules.activation.LeakyReLU'>, tdnn_blocks=5, tdnn_channels=[512, 512, 512, 512, 1500], tdnn_kernel_sizes=[5, 3, 3, 1, 1], tdnn_dilations=[1, 2, 3, 1, 1], lin_neurons=512, in_channels=40)[source]

Bases: Module

This model extracts X-vectors for speaker recognition and diarization.

Parameters:

device (str) – Device used e.g. “cpu” or “cuda”.
activation (torch class) – A class for constructing the activation layers.
tdnn_blocks (int) – Number of time-delay neural (TDNN) layers.
tdnn_channels (list of ints) – Output channels for TDNN layer.
tdnn_kernel_sizes (list of ints) – List of kernel sizes for each TDNN layer.
tdnn_dilations (list of ints) – List of dilations for kernels in each TDNN layer.
lin_neurons (int) – Number of neurons in linear layers.

Example

>>> compute_xvect = Xvector('cpu')
>>> input_feats = torch.rand([5, 10, 40])
>>> outputs = compute_xvect(input_feats)
>>> outputs.shape
torch.Size([5, 1, 512])

forward(x, lens=None)[source]

Returns the x-vectors.

Parameters:: x (torch.Tensor) –

training: bool

class speechbrain.lobes.models.Xvector.Classifier(input_shape, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, lin_blocks=1, lin_neurons=512, out_neurons=1211)[source]

Bases: Sequential

This class implements the last MLP on the top of xvector features.

Parameters:

input_shape (tuple) – Expected shape of an example input.
activation (torch class) – A class for constructing the activation layers.
lin_blocks (int) – Number of linear layers.
lin_neurons (int) – Number of neurons in linear layers.
out_neurons (int) – Number of output neurons.

Example

>>> input_feats = torch.rand([5, 10, 40])
>>> compute_xvect = Xvector()
>>> xvects = compute_xvect(input_feats)
>>> classify = Classifier(input_shape=xvects.shape)
>>> output = classify(xvects)
>>> output.shape
torch.Size([5, 1, 1211])

class speechbrain.lobes.models.Xvector.Discriminator(input_shape, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, lin_blocks=1, lin_neurons=512, out_neurons=1)[source]

Bases: Sequential

This class implements a discriminator on the top of xvector features.

Parameters:

device (str) – Device used e.g. “cpu” or “cuda”
activation (torch class) – A class for constructing the activation layers.
lin_blocks (int) – Number of linear layers.
lin_neurons (int) – Number of neurons in linear layers.

Example

>>> input_feats = torch.rand([5, 10, 40])
>>> compute_xvect = Xvector()
>>> xvects = compute_xvect(input_feats)
>>> discriminate = Discriminator(xvects.shape)
>>> output = discriminate(xvects)
>>> output.shape
torch.Size([5, 1, 1])