speechbrain.lobes.models.L2I moduleο
This file implements the necessary classes and functions to implement Listen-to-Interpret (L2I) interpretation method from https://arxiv.org/abs/2202.11479v2
Authors * Cem Subakan 2022 * Francesco Paissan 2022
Summaryο
Classes:
This class estimates a saliency map on the STFT domain, given classifier representations. |
|
This class estimates the NMF activations to create a saliency map using the L2I framework |
|
This class implements an NMF decoder |
|
This class implements an NMF encoder with a convolutional network |
|
Convolutional Layers to estimate NMF Activations from Classifier Representations |
|
Convolutional Layers to estimate NMF Activations from Classifier Representations, optimized for log-spectra. |
|
This class implements a linear classifier on top of NMF activations |
Functions:
Applies Xavier initialization to network weights. |
Referenceο
- class speechbrain.lobes.models.L2I.Psi(n_comp=100, T=431, in_emb_dims=[2048, 1024, 512])[source]ο
Bases:
ModuleConvolutional Layers to estimate NMF Activations from Classifier Representations
- Parameters:
n_comp (int) β Number of NMF components (or equivalently number of neurons at the output per timestep)
T (int) β The targeted length along the time dimension
in_emb_dims (List with int elements) β A list with length 3 that contains the dimensionality of the input dimensions The list needs to match the number of channels in the input classifier representations The last entry should be the smallest entry
Example
>>> inp = [ ... torch.ones(2, 150, 6, 2), ... torch.ones(2, 100, 6, 2), ... torch.ones(2, 50, 12, 5), ... ] >>> psi = Psi(n_comp=100, T=120, in_emb_dims=[150, 100, 50]) >>> h = psi(inp) >>> print(h.shape) torch.Size([2, 100, 120])
- class speechbrain.lobes.models.L2I.NMFDecoderAudio(n_comp=100, n_freq=513, device='cuda')[source]ο
Bases:
ModuleThis class implements an NMF decoder
- Parameters:
Example
>>> NMF_dec = NMFDecoderAudio(20, 210, device="cpu") >>> H = torch.rand(1, 20, 150) >>> Xhat = NMF_dec.forward(H) >>> print(Xhat.shape) torch.Size([1, 210, 150])
- forward(H)[source]ο
The forward pass for NMF given the activations H
- Parameters:
H (torch.Tensor) β
The activations Tensor with shape B x n_comp x T where B = Batchsize
n_comp = number of NMF components T = number of timepoints
- Returns:
output β The NMF outputs
- Return type:
- speechbrain.lobes.models.L2I.weights_init(m)[source]ο
Applies Xavier initialization to network weights.
- Parameters:
m (nn.Module) β Module to initialize.
- class speechbrain.lobes.models.L2I.PsiOptimized(dim=128, K=100, numclasses=50, use_adapter=False, adapter_reduce_dim=True)[source]ο
Bases:
ModuleConvolutional Layers to estimate NMF Activations from Classifier Representations, optimized for log-spectra.
- Parameters:
dim (int) β Dimension of the hidden representations (input to the classifier).
K (int) β Number of NMF components (or equivalently number of neurons at the output per timestep)
numclasses (int) β Number of possible classes.
use_adapter (bool) β
Trueif you wish to learn an adapter for the latent representations.adapter_reduce_dim (bool) β
Trueif the adapter should compress the latent representations.
Example
>>> inp = torch.randn(1, 256, 26, 32) >>> psi = PsiOptimized( ... dim=256, K=100, use_adapter=False, adapter_reduce_dim=False ... ) >>> h, inp_ad = psi(inp) >>> print(h.shape, inp_ad.shape) torch.Size([1, 1, 417, 100]) torch.Size([1, 256, 26, 32])
- forward(hs)[source]ο
Computes forward step.
- Parameters:
hs (torch.Tensor) β Latent representations (input to the classifier). Expected shape
torch.Size([B, C, H, W]).- Returns:
NMF activations and adapted representations. Shape `torch.Size([B, 1, T, 100])`.
- Return type:
- class speechbrain.lobes.models.L2I.Theta(n_comp=100, T=431, num_classes=50)[source]ο
Bases:
ModuleThis class implements a linear classifier on top of NMF activations
- Parameters:
Example
>>> theta = Theta(30, 120, 50) >>> H = torch.rand(1, 30, 120) >>> c_hat = theta.forward(H) >>> print(c_hat.shape) torch.Size([1, 50])
- forward(H)[source]ο
We first collapse the time axis, and then pass through the linear layer
- Parameters:
H (torch.Tensor) β
The activations Tensor with shape B x n_comp x T where B = Batchsize
n_comp = number of NMF components T = number of timepoints
- Returns:
theta_out β Classifier output
- Return type:
- class speechbrain.lobes.models.L2I.NMFEncoder(n_freq, n_comp)[source]ο
Bases:
ModuleThis class implements an NMF encoder with a convolutional network
- Parameters:
Example
>>> nmfencoder = NMFEncoder(513, 100) >>> X = torch.rand(1, 513, 240) >>> Hhat = nmfencoder(X) >>> print(Hhat.shape) torch.Size([1, 100, 240])
- forward(X)[source]ο
- Parameters:
X (torch.Tensor) β
The input spectrogram Tensor with shape B x n_freq x T where B = Batchsize
n_freq = nfft for the input spectrogram T = number of timepoints
- Return type:
NMF encoded outputs.
- class speechbrain.lobes.models.L2I.CNN14PSI_stft(dim=128, K=100)[source]ο
Bases:
ModuleThis class estimates a saliency map on the STFT domain, given classifier representations.
- Parameters:
Example
>>> from speechbrain.lobes.models.Cnn14 import Cnn14 >>> classifier_embedder = Cnn14(mel_bins=80, emb_dim=2048, return_reps=True) >>> x = torch.randn(2, 201, 80) >>> _, hs = classifier_embedder(x) >>> psimodel = CNN14PSI_stft(2048, 20) >>> xhat = psimodel.forward(hs) >>> print(xhat.shape) torch.Size([2, 20, 207])
- forward(hs, labels=None)[source]ο
Forward step. Estimates NMF activations to be used to get the saliency mask.
- Parameters:
hs (torch.Tensor) β Classifierβs representations.
labels (torch.Tensor) β Predicted labels for classifierβs representations.
- Returns:
xhat β The estimated NMF activation coefficients
- Return type:
- class speechbrain.lobes.models.L2I.CNN14PSI_stft_2d(dim=128, K=100)[source]ο
Bases:
ModuleThis class estimates the NMF activations to create a saliency map using the L2I framework
- Parameters:
Example
>>> from speechbrain.lobes.models.Cnn14 import Cnn14 >>> classifier_embedder = Cnn14(mel_bins=80, emb_dim=2048, return_reps=True) >>> x = torch.randn(2, 201, 80) >>> _, hs = classifier_embedder(x) >>> psimodel = CNN14PSI_stft_2d(2048, 20) >>> xhat = psimodel.forward(hs) >>> print(xhat.shape) torch.Size([2, 20, 207])
- forward(hs, labels=None)[source]ο
Forward step. Estimates NMF activations to be used to get the saliency mask.
- Parameters:
hs (torch.Tensor) β Classifierβs representations.
labels (torch.Tensor) β Predicted labels for classifierβs representations.
- Returns:
xhat β The estimated NMF activation coefficients
- Return type: