speechbrain.inference.interpretability module

Specifies the inference interfaces for interpretability modules.

Authors:

Aku Rouhe 2021
Peter Plantinga 2021
Loren Lugosch 2020
Mirco Ravanelli 2020
Titouan Parcollet 2021
Abdel Heba 2021
Andreas Nautsch 2022, 2023
Pooneh Mousavi 2023
Sylvain de Langen 2023
Adel Moumen 2023
Pradnya Kandarkar 2023

Summary

Classes:

PIQAudioInterpreter

This class implements the interface for the PIQ posthoc interpreter for an audio classifier.

Reference

class speechbrain.inference.interpretability.PIQAudioInterpreter(*args, **kwargs)[source]

Bases: Pretrained

This class implements the interface for the PIQ posthoc interpreter for an audio classifier.

Example

>>> from speechbrain.inference.interpretability import PIQAudioInterpreter
>>> tmpdir = getfixture("tmpdir")
>>> interpreter = PIQAudioInterpreter.from_hparams(
...     source="speechbrain/PIQ-ESC50",
...     savedir=tmpdir,
... )
>>> signal = torch.randn(1, 16000)
>>> interpretation, _ = interpreter.interpret_batch(signal)

preprocess(wavs)[source]: Pre-process wavs to calculate STFTs

classifier_forward(X_stft_logpower)[source]: the forward pass for the classifier

invert_stft_with_phase(X_int, X_stft_phase)[source]: Inverts STFT spectra given phase.

interpret_batch(wavs)[source]

Classifies the given audio into the given set of labels. It also provides the interpretation in the audio domain.

Parameters:

wavs (torch.Tensor) – Batch of waveforms [batch, time, channels] or [batch, time] depending on the model. Make sure the sample rate is fs=16000 Hz.

Returns:

x_int_sound_domain – The interpretation in the waveform domain
text_lab – The text label for the classification
fs_model – The sampling frequency of the model. Useful to save the audio.

interpret_file(path, savedir='audio_cache')[source]

Classifies the given audiofile into the given set of labels. It also provides the interpretation in the audio domain.

Parameters:

path (str) – Path to audio file to classify.

Returns:

x_int_sound_domain – The interpretation in the waveform domain
text_lab – The text label for the classification
fs_model – The sampling frequency of the model. Useful to save the audio.

forward(wavs, wav_lens=None)[source]: Runs the classification

training: bool