speechbrain.processing.PLDA_LDA module
A popular speaker recognition/diarization model (LDA and PLDA).
- Authors
Anthony Larcher 2020
Nauman Dawalatabad 2020
- Relevant Papers
This implementation of PLDA is based on the following papers.
- PLDA model Training
Ye Jiang et. al, “PLDA Modeling in I-Vector and Supervector Space for Speaker Verification,” in Interspeech, 2012.
Patrick Kenny et. al, “PLDA for speaker verification with utterances of arbitrary duration,” in ICASSP, 2013.
- PLDA scoring (fast scoring)
Daniel Garcia-Romero et. al, “Analysis of i-vector length normalization in speaker recognition systems,” in Interspeech, 2011.
Weiwei-LIN et. al, “Fast Scoring for PLDA with Uncertainty Propagation,” in Odyssey, 2016.
Kong Aik Lee et. al, “Multi-session PLDA Scoring of I-vector for Partially Open-Set Speaker Detection,” in Interspeech 2013.
- Credits
This code is adapted from: https://git-lium.univ-lemans.fr/Larcher/sidekit
Summary
Classes:
A class to perform Linear Discriminant Analysis. |
|
A class that encodes trial index information. |
|
A class to train PLDA model from embeddings. |
|
A class for storing scores for trials. |
|
A utility class for PLDA class used for statistics calculations. |
Functions:
Difference beteween lists. |
|
A function for PLDA estimation. |
|
Compute the PLDA scores between to sets of vectors. |
|
Cheks if the elements if list1 are contained in list2. |
Reference
- class speechbrain.processing.PLDA_LDA.StatObject_SB(modelset=None, segset=None, start=None, stop=None, stat0=None, stat1=None)[source]
Bases:
object
A utility class for PLDA class used for statistics calculations.
This is also used to pack deep embeddings and meta-information in one object.
- Parameters:
modelset (list) – List of model IDs for each session as an array of strings.
segset (list) – List of session IDs as an array of strings.
start (int) – Index of the first frame of the segment.
stop (int) – Index of the last frame of the segment.
stat0 (tensor) – An ndarray of float64. Each line contains 0-th order statistics from the corresponding session.
stat1 (tensor) – An ndarray of float64. Each line contains 1-st order statistics from the corresponding session.
- save_stat_object(filename)[source]
Saves stats in picke format.
- Parameters:
filename (path) – Path where the pickle file will be stored.
- get_model_segsets(mod_id)[source]
Return segments of a given model.
- Parameters:
mod_id (str) – ID of the model for which segments will be returned.
- get_model_start(mod_id)[source]
Return start of segment of a given model.
- Parameters:
mod_id (str) – ID of the model for which start will be returned.
- get_model_stop(mod_id)[source]
Return stop of segment of a given model.
- Parameters:
mod_id (str) – ID of the model which stop will be returned.
- get_total_covariance_stat1()[source]
Compute and return the total covariance matrix of the first-order statistics.
- get_model_stat0(mod_id)[source]
Return zero-order statistics of a given model
- Parameters:
mod_id (str) – ID of the model which stat0 will be returned.
- get_model_stat1(mod_id)[source]
Return first-order statistics of a given model.
- Parameters:
mod_id (str) – ID of the model which stat1 will be returned.
- sum_stat_per_model()[source]
Sum the zero- and first-order statistics per model and store them in a new StatObject_SB. Returns a StatObject_SB object with the statistics summed per model and a numpy array with session_per_model.
- center_stat1(mu)[source]
Center first order statistics.
- Parameters:
mu (array) – Array to center on.
- rotate_stat1(R)[source]
Rotate first-order statistics by a right-product.
- Parameters:
R (ndarray) – Matrix to use for right product on the first order statistics.
- whiten_stat1(mu, sigma, isSqrInvSigma=False)[source]
Whiten first-order statistics If sigma.ndim == 1, case of a diagonal covariance. If sigma.ndim == 2, case of a single Gaussian with full covariance. If sigma.ndim == 3, case of a full covariance UBM.
- Parameters:
mu (array) – Mean vector to be subtracted from the statistics.
sigma (narray) – Co-variance matrix or covariance super-vector.
isSqrInvSigma (bool) – True if the input Sigma matrix is the inverse of the square root of a covariance matrix.
- align_models(model_list)[source]
- Align models of the current StatServer to match a list of models
provided as input parameter. The size of the StatServer might be reduced to match the input list of models.
- Parameters:
model_list (ndarray of strings) – List of models to match.
- speechbrain.processing.PLDA_LDA.ismember(list1, list2)[source]
Cheks if the elements if list1 are contained in list2.
- class speechbrain.processing.PLDA_LDA.Ndx(ndx_file_name='', models=array([], dtype=float64), testsegs=array([], dtype=float64))[source]
Bases:
object
A class that encodes trial index information. It has a list of model names and a list of test segment names and a matrix indicating which combinations of model and test segment are trials of interest.
- Parameters:
- __init__(ndx_file_name='', models=array([], dtype=float64), testsegs=array([], dtype=float64))[source]
Initialize a Ndx object by loading information from a file.
- Parameters:
ndx_file_name (str) – Name of the file to load.
- filter(modlist, seglist, keep)[source]
Removes some of the information in an Ndx. Useful for creating a gender specific Ndx from a pooled gender Ndx. Depending on the value of ‘keep’, the two input lists indicate the strings to retain or the strings to discard.
- Parameters:
modlist (array) – A cell array of strings which will be compared with the modelset of ‘inndx’.
seglist (array) – A cell array of strings which will be compared with the segset of ‘inndx’.
keep (bool) – Indicating whether modlist and seglist are the models to keep or discard.
- class speechbrain.processing.PLDA_LDA.Scores(scores_file_name='')[source]
Bases:
object
A class for storing scores for trials. The modelset and segset fields are lists of model and test segment names respectively. The element i,j of scoremat and scoremask corresponds to the trial involving model i and test segment j.
- Parameters:
- speechbrain.processing.PLDA_LDA.fa_model_loop(batch_start, mini_batch_indices, factor_analyser, stat0, stat1, e_h, e_hh)[source]
A function for PLDA estimation.
- Parameters:
batch_start (int) – Index to start at in the list.
mini_batch_indices (list) – Indices of the elements in the list (should start at zero).
factor_analyser (instance of PLDA class) – PLDA class object.
stat0 (tensor) – Matrix of zero-order statistics.
stat1 (tensor) – Matrix of first-order statistics.
e_h (tensor) – An accumulator matrix.
e_hh (tensor) – An accumulator matrix.
- speechbrain.processing.PLDA_LDA.fast_PLDA_scoring(enroll, test, ndx, mu, F, Sigma, test_uncertainty=None, Vtrans=None, p_known=0.0, scaling_factor=1.0, check_missing=True)[source]
Compute the PLDA scores between to sets of vectors. The list of trials to perform is given in an Ndx object. PLDA matrices have to be pre-computed. i-vectors/x-vectors are supposed to be whitened before.
- Parameters:
enroll (speechbrain.utils.Xvector_PLDA_sp.StatObject_SB) – A StatServer in which stat1 are xvectors.
test (speechbrain.utils.Xvector_PLDA_sp.StatObject_SB) – A StatServer in which stat1 are xvectors.
ndx (speechbrain.utils.Xvector_PLDA_sp.Ndx) – An Ndx object defining the list of trials to perform.
mu (double) – The mean vector of the PLDA gaussian.
F (tensor) – The between-class co-variance matrix of the PLDA.
Sigma (tensor) – The residual covariance matrix.
p_known (float) – Probability of having a known speaker for open-set identification case (=1 for the verification task and =0 for the closed-set case).
check_missing (bool) – If True, check that all models and segments exist.
- class speechbrain.processing.PLDA_LDA.LDA[source]
Bases:
object
A class to perform Linear Discriminant Analysis.
It returns the low dimensional representation as per LDA.
- Parameters:
reduced_dim (int) – The dimension of the output representation.
- class speechbrain.processing.PLDA_LDA.PLDA(mean=None, F=None, Sigma=None, rank_f=100, nb_iter=10, scaling_factor=1.0)[source]
Bases:
object
A class to train PLDA model from embeddings.
The input is in speechbrain.utils.StatObject_SB format. Trains a simplified PLDA model no within-class covariance matrix but full residual covariance matrix.
- Parameters:
mean (tensor) – Mean of the vectors.
F (tensor) – Eigenvoice matrix.
Sigma (tensor) – Residual matrix.
Example
>>> from speechbrain.processing.PLDA_LDA import * >>> import random, numpy >>> dim, N = 10, 100 >>> n_spkrs = 10 >>> train_xv = numpy.random.rand(N, dim) >>> md = ['md'+str(random.randrange(1,n_spkrs,1)) for i in range(N)] >>> modelset = numpy.array(md, dtype="|O") >>> sg = ['sg'+str(i) for i in range(N)] >>> segset = numpy.array(sg, dtype="|O") >>> s = numpy.array([None] * N) >>> stat0 = numpy.array([[1.0]]* N) >>> xvectors_stat = StatObject_SB(modelset=modelset, segset=segset, start=s, stop=s, stat0=stat0, stat1=train_xv) >>> # Training PLDA model: M ~ (mean, F, Sigma) >>> plda = PLDA(rank_f=5) >>> plda.plda(xvectors_stat) >>> print (plda.mean.shape) (10,) >>> print (plda.F.shape) (10, 5) >>> print (plda.Sigma.shape) (10, 10) >>> # Enrollment (20 utts), Test (30 utts) >>> en_N = 20 >>> en_xv = numpy.random.rand(en_N, dim) >>> en_sgs = ['en'+str(i) for i in range(en_N)] >>> en_sets = numpy.array(en_sgs, dtype="|O") >>> en_s = numpy.array([None] * en_N) >>> en_stat0 = numpy.array([[1.0]]* en_N) >>> en_stat = StatObject_SB(modelset=en_sets, segset=en_sets, start=en_s, stop=en_s, stat0=en_stat0, stat1=en_xv) >>> te_N = 30 >>> te_xv = numpy.random.rand(te_N, dim) >>> te_sgs = ['te'+str(i) for i in range(te_N)] >>> te_sets = numpy.array(te_sgs, dtype="|O") >>> te_s = numpy.array([None] * te_N) >>> te_stat0 = numpy.array([[1.0]]* te_N) >>> te_stat = StatObject_SB(modelset=te_sets, segset=te_sets, start=te_s, stop=te_s, stat0=te_stat0, stat1=te_xv) >>> ndx = Ndx(models=en_sets, testsegs=te_sets) >>> # PLDA Scoring >>> scores_plda = fast_PLDA_scoring(en_stat, te_stat, ndx, plda.mean, plda.F, plda.Sigma) >>> print (scores_plda.scoremat.shape) (20, 30)
- plda(stat_server=None, output_file_name=None, whiten=False, w_stat_server=None)[source]
Trains PLDA model with no within class covariance matrix but full residual covariance matrix.
- Parameters:
stat_server (speechbrain.processing.PLDA_LDA.StatObject_SB) – Contains vectors and meta-information to perform PLDA
rank_f (int) – Rank of the between-class covariance matrix.
nb_iter (int) – Number of iterations to run.
scaling_factor (float) – Scaling factor to downscale statistics (value between 0 and 1).
output_file_name (str) – Name of the output file where to store PLDA model.