speechbrain.processing.PLDA_LDA module

A popular speaker recognition/diarization model (LDA and PLDA).

Authors

Anthony Larcher 2020
Nauman Dawalatabad 2020

Relevant Papers

This implementation of PLDA is based on the following papers.
PLDA model Training
- Ye Jiang et. al, “PLDA Modeling in I-Vector and Supervector Space for Speaker Verification,” in Interspeech, 2012.
- Patrick Kenny et. al, “PLDA for speaker verification with utterances of arbitrary duration,” in ICASSP, 2013.
PLDA scoring (fast scoring)
- Daniel Garcia-Romero et. al, “Analysis of i-vector length normalization in speaker recognition systems,” in Interspeech, 2011.
- Weiwei-LIN et. al, “Fast Scoring for PLDA with Uncertainty Propagation,” in Odyssey, 2016.
- Kong Aik Lee et. al, “Multi-session PLDA Scoring of I-vector for Partially Open-Set Speaker Detection,” in Interspeech 2013.

Credits

This code is adapted from: https://projets-lium.univ-lemans.fr/sidekit/

Summary

Classes:

`LDA`	A class to perform Linear Discriminant Analysis.
`Ndx`	A class that encodes trial index information.
`PLDA`	A class to train PLDA model from embeddings.
`Scores`	A class for storing scores for trials.
`StatObject_SB`	A utility class for PLDA class used for statistics calculations.

Functions:

`diff`	Difference beteween lists.
`fa_model_loop`	A function for PLDA estimation.
`fast_PLDA_scoring`	Compute the PLDA scores between to sets of vectors.
`ismember`	Cheks if the elements if list1 are contained in list2.

Reference

class speechbrain.processing.PLDA_LDA.StatObject_SB(modelset=None, segset=None, start=None, stop=None, stat0=None, stat1=None)[source]

Bases: object

A utility class for PLDA class used for statistics calculations.

This is also used to pack deep embeddings and meta-information in one object.

Parameters:

modelset (list) – List of model IDs for each session as an array of strings.
segset (list) – List of session IDs as an array of strings.
start (int) – Index of the first frame of the segment.
stop (int) – Index of the last frame of the segment.
stat0 (tensor) – An ndarray of float64. Each line contains 0-th order statistics from the corresponding session.
stat1 (tensor) – An ndarray of float64. Each line contains 1-st order statistics from the corresponding session.

save_stat_object(filename)[source]

Saves stats in picke format.

Parameters:: filename (path) – Path where the pickle file will be stored.

get_model_segsets(mod_id)[source]

Return segments of a given model.

Parameters:: mod_id (str) – ID of the model for which segments will be returned.

get_model_start(mod_id)[source]

Return start of segment of a given model.

Parameters:: mod_id (str) – ID of the model for which start will be returned.

get_model_stop(mod_id)[source]

Return stop of segment of a given model.

Parameters:: mod_id (str) – ID of the model which stop will be returned.

get_mean_stat1()[source]: Return the mean of first order statistics.

get_total_covariance_stat1()[source]: Compute and return the total covariance matrix of the first-order statistics.

get_model_stat0(mod_id)[source]

Return zero-order statistics of a given model

Parameters:: mod_id (str) – ID of the model which stat0 will be returned.

get_model_stat1(mod_id)[source]

Return first-order statistics of a given model.

Parameters:: mod_id (str) – ID of the model which stat1 will be returned.

sum_stat_per_model()[source]: Sum the zero- and first-order statistics per model and store them in a new StatObject_SB. Returns a StatObject_SB object with the statistics summed per model and a numpy array with session_per_model.

center_stat1(mu)[source]

Center first order statistics.

Parameters:: mu (array) – Array to center on.

norm_stat1()[source]: Divide all first-order statistics by their Euclidean norm.

rotate_stat1(R)[source]

Rotate first-order statistics by a right-product.

Parameters:: R (ndarray) – Matrix to use for right product on the first order statistics.

whiten_stat1(mu, sigma, isSqrInvSigma=False)[source]

Whiten first-order statistics If sigma.ndim == 1, case of a diagonal covariance. If sigma.ndim == 2, case of a single Gaussian with full covariance. If sigma.ndim == 3, case of a full covariance UBM.

Parameters:

mu (array) – Mean vector to be subtracted from the statistics.
sigma (narray) – Co-variance matrix or covariance super-vector.
isSqrInvSigma (bool) – True if the input Sigma matrix is the inverse of the square root of a covariance matrix.

align_models(model_list)[source]

Align models of the current StatServer to match a list of models: provided as input parameter. The size of the StatServer might be reduced to match the input list of models.

Parameters:: model_list (ndarray of strings) – List of models to match.

align_segments(segment_list)[source]

Align segments of the current StatServer to match a list of segment: provided as input parameter. The size of the StatServer might be reduced to match the input list of segments.

Parameters:: segment_list (ndarray of strings) – list of segments to match

get_lda_matrix_stat1(rank)[source]

Compute and return the Linear Discriminant Analysis matrix: on the first-order statistics. Columns of the LDA matrix are ordered according to the corresponding eigenvalues in descending order.

Parameters:: rank (int) – Rank of the LDA matrix to return.

speechbrain.processing.PLDA_LDA.diff(list1, list2)[source]: Difference beteween lists.

speechbrain.processing.PLDA_LDA.ismember(list1, list2)[source]: Cheks if the elements if list1 are contained in list2.

class speechbrain.processing.PLDA_LDA.Ndx(ndx_file_name='', models=array([], dtype=float64), testsegs=array([], dtype=float64))[source]

Bases: object

A class that encodes trial index information. It has a list of model names and a list of test segment names and a matrix indicating which combinations of model and test segment are trials of interest.

Parameters:

modelset (list) – List of unique models in a ndarray.
segset (list) – List of unique test segments in a ndarray.
trialmask (2D ndarray of bool.) – Rows correspond to the models and columns to the test segments. True, if the trial is of interest.

__init__(ndx_file_name='', models=array([], dtype=float64), testsegs=array([], dtype=float64))[source]

Initialize a Ndx object by loading information from a file.

Parameters:: ndx_file_name (str) – Name of the file to load.

save_ndx_object(output_file_name)[source]: Saves the object in pickle format

filter(modlist, seglist, keep)[source]

Removes some of the information in an Ndx. Useful for creating a gender specific Ndx from a pooled gender Ndx. Depending on the value of ‘keep’, the two input lists indicate the strings to retain or the strings to discard.

Parameters:

modlist (array) – A cell array of strings which will be compared with the modelset of ‘inndx’.
seglist (array) – A cell array of strings which will be compared with the segset of ‘inndx’.
keep (bool) – Indicating whether modlist and seglist are the models to keep or discard.

validate()[source]: Checks that an object of type Ndx obeys certain rules that must always be true. Returns a boolean value indicating whether the object is valid

class speechbrain.processing.PLDA_LDA.Scores(scores_file_name='')[source]

Bases: object

A class for storing scores for trials. The modelset and segset fields are lists of model and test segment names respectively. The element i,j of scoremat and scoremask corresponds to the trial involving model i and test segment j.

Parameters:

modelset (list) – List of unique models in a ndarray.
segset (list) – List of unique test segments in a ndarray.
scoremask (2D ndarray of bool) – Indicates the trials of interest, i.e., the entry i,j in scoremat should be ignored if scoremask[i,j] is False.
scoremat (2D ndarray) – Scores matrix.

__init__(scores_file_name='')[source]

Initialize a Scores object by loading information from a file HDF5 format.

Parameters:: scores_file_name (str) – Name of the file to load.

speechbrain.processing.PLDA_LDA.fa_model_loop(batch_start, mini_batch_indices, factor_analyser, stat0, stat1, e_h, e_hh)[source]

A function for PLDA estimation.

Parameters:

batch_start (int) – Index to start at in the list.
mini_batch_indices (list) – Indices of the elements in the list (should start at zero).
factor_analyser (instance of PLDA class) – PLDA class object.
stat0 (tensor) – Matrix of zero-order statistics.
stat1 (tensor) – Matrix of first-order statistics.
e_h (tensor) – An accumulator matrix.
e_hh (tensor) – An accumulator matrix.

speechbrain.processing.PLDA_LDA.fast_PLDA_scoring(enroll, test, ndx, mu, F, Sigma, test_uncertainty=None, Vtrans=None, p_known=0.0, scaling_factor=1.0, check_missing=True)[source]

Compute the PLDA scores between to sets of vectors. The list of trials to perform is given in an Ndx object. PLDA matrices have to be pre-computed. i-vectors/x-vectors are supposed to be whitened before.

Parameters:

enroll (speechbrain.utils.Xvector_PLDA_sp.StatObject_SB) – A StatServer in which stat1 are xvectors.
test (speechbrain.utils.Xvector_PLDA_sp.StatObject_SB) – A StatServer in which stat1 are xvectors.
ndx (speechbrain.utils.Xvector_PLDA_sp.Ndx) – An Ndx object defining the list of trials to perform.
mu (double) – The mean vector of the PLDA gaussian.
F (tensor) – The between-class co-variance matrix of the PLDA.
Sigma (tensor) – The residual covariance matrix.
p_known (float) – Probability of having a known speaker for open-set identification case (=1 for the verification task and =0 for the closed-set case).
check_missing (bool) – If True, check that all models and segments exist.

class speechbrain.processing.PLDA_LDA.LDA[source]

Bases: object

A class to perform Linear Discriminant Analysis.

It returns the low dimensional representation as per LDA.

Parameters:: reduced_dim (int) – The dimension of the output representation.

do_lda(stat_server=None, reduced_dim=2, transform_mat=None)[source]

Performs LDA and projects the vectors onto lower dimension space.

Parameters:

stat_server (object of speechbrain.processing.PLDA_LDA.StatObject_SB.) – Contains vectors and meta-information to perform LDA.
reduced_dim (int) – Dimension of the reduced space.

class speechbrain.processing.PLDA_LDA.PLDA(mean=None, F=None, Sigma=None, rank_f=100, nb_iter=10, scaling_factor=1.0)[source]

Bases: object

A class to train PLDA model from embeddings.

The input is in speechbrain.utils.StatObject_SB format. Trains a simplified PLDA model no within-class covariance matrix but full residual covariance matrix.

Parameters:

mean (tensor) – Mean of the vectors.
F (tensor) – Eigenvoice matrix.
Sigma (tensor) – Residual matrix.

Example

>>> from speechbrain.processing.PLDA_LDA import *
>>> import random, numpy
>>> dim, N = 10, 100
>>> n_spkrs = 10
>>> train_xv = numpy.random.rand(N, dim)
>>> md = ['md'+str(random.randrange(1,n_spkrs,1)) for i in range(N)]
>>> modelset = numpy.array(md, dtype="|O")
>>> sg = ['sg'+str(i) for i in range(N)]
>>> segset = numpy.array(sg, dtype="|O")
>>> s = numpy.array([None] * N)
>>> stat0 = numpy.array([[1.0]]* N)
>>> xvectors_stat = StatObject_SB(modelset=modelset, segset=segset, start=s, stop=s, stat0=stat0, stat1=train_xv)
>>> # Training PLDA model: M ~ (mean, F, Sigma)
>>> plda = PLDA(rank_f=5)
>>> plda.plda(xvectors_stat)
>>> print (plda.mean.shape)
(10,)
>>> print (plda.F.shape)
(10, 5)
>>> print (plda.Sigma.shape)
(10, 10)
>>> # Enrollment (20 utts), Test (30 utts)
>>> en_N = 20
>>> en_xv = numpy.random.rand(en_N, dim)
>>> en_sgs = ['en'+str(i) for i in range(en_N)]
>>> en_sets = numpy.array(en_sgs, dtype="|O")
>>> en_s = numpy.array([None] * en_N)
>>> en_stat0 = numpy.array([[1.0]]* en_N)
>>> en_stat = StatObject_SB(modelset=en_sets, segset=en_sets, start=en_s, stop=en_s, stat0=en_stat0, stat1=en_xv)
>>> te_N = 30
>>> te_xv = numpy.random.rand(te_N, dim)
>>> te_sgs = ['te'+str(i) for i in range(te_N)]
>>> te_sets = numpy.array(te_sgs, dtype="|O")
>>> te_s = numpy.array([None] * te_N)
>>> te_stat0 = numpy.array([[1.0]]* te_N)
>>> te_stat = StatObject_SB(modelset=te_sets, segset=te_sets, start=te_s, stop=te_s, stat0=te_stat0, stat1=te_xv)
>>> ndx = Ndx(models=en_sets, testsegs=te_sets)
>>> # PLDA Scoring
>>> scores_plda = fast_PLDA_scoring(en_stat, te_stat, ndx, plda.mean, plda.F, plda.Sigma)
>>> print (scores_plda.scoremat.shape)
(20, 30)

plda(stat_server=None, output_file_name=None, whiten=False, w_stat_server=None)[source]

Trains PLDA model with no within class covariance matrix but full residual covariance matrix.

Parameters:

stat_server (speechbrain.processing.PLDA_LDA.StatObject_SB) – Contains vectors and meta-information to perform PLDA
rank_f (int) – Rank of the between-class covariance matrix.
nb_iter (int) – Number of iterations to run.
scaling_factor (float) – Scaling factor to downscale statistics (value between 0 and 1).
output_file_name (str) – Name of the output file where to store PLDA model.