speechbrain.inference.interpretability module

Specifies the inference interfaces for interpretability modules.

Authors:
  • Aku Rouhe 2021

  • Peter Plantinga 2021

  • Loren Lugosch 2020

  • Mirco Ravanelli 2020

  • Titouan Parcollet 2021

  • Abdel Heba 2021

  • Andreas Nautsch 2022, 2023

  • Pooneh Mousavi 2023

  • Sylvain de Langen 2023

  • Adel Moumen 2023

  • Pradnya Kandarkar 2023

Summary

Classes:

PIQAudioInterpreter

This class implements the interface for the PIQ posthoc interpreter for an audio classifier.

Reference

class speechbrain.inference.interpretability.PIQAudioInterpreter(*args, **kwargs)[source]

Bases: Pretrained

This class implements the interface for the PIQ posthoc interpreter for an audio classifier.

Example

>>> from speechbrain.inference.interpretability import PIQAudioInterpreter
>>> tmpdir = getfixture("tmpdir")
>>> interpreter = PIQAudioInterpreter.from_hparams(
...     source="speechbrain/PIQ-ESC50",
...     savedir=tmpdir,
... )
>>> signal = torch.randn(1, 16000)
>>> interpretation, _ = interpreter.interpret_batch(signal)
preprocess(wavs)[source]

Pre-process wavs to calculate STFTs

classifier_forward(X_stft_logpower)[source]

the forward pass for the classifier

invert_stft_with_phase(X_int, X_stft_phase)[source]

Inverts STFT spectra given phase.

interpret_batch(wavs)[source]

Classifies the given audio into the given set of labels. It also provides the interpretation in the audio domain.

Parameters:

wavs (torch.Tensor) – Batch of waveforms [batch, time, channels] or [batch, time] depending on the model. Make sure the sample rate is fs=16000 Hz.

Returns:

  • x_int_sound_domain – The interpretation in the waveform domain

  • text_lab – The text label for the classification

  • fs_model – The sampling frequency of the model. Useful to save the audio.

interpret_file(path, savedir='audio_cache')[source]

Classifies the given audiofile into the given set of labels. It also provides the interpretation in the audio domain.

Parameters:

path (str) – Path to audio file to classify.

Returns:

  • x_int_sound_domain – The interpretation in the waveform domain

  • text_lab – The text label for the classification

  • fs_model – The sampling frequency of the model. Useful to save the audio.

forward(wavs, wav_lens=None)[source]

Runs the classification

training: bool