speechbrain.inference.SLU module

Specifies the inference interfaces for Spoken Language Understanding (SLU) modules.

Authors:
  • Aku Rouhe 2021

  • Peter Plantinga 2021

  • Loren Lugosch 2020

  • Mirco Ravanelli 2020

  • Titouan Parcollet 2021

  • Abdel Heba 2021

  • Andreas Nautsch 2022, 2023

  • Pooneh Mousavi 2023

  • Sylvain de Langen 2023

  • Adel Moumen 2023

  • Pradnya Kandarkar 2023

Summary

Classes:

EndToEndSLU

An end-to-end SLU model.

Reference

class speechbrain.inference.SLU.EndToEndSLU(*args, **kwargs)[source]

Bases: Pretrained

An end-to-end SLU model.

The class can be used either to run only the encoder (encode()) to extract features or to run the entire model (decode()) to map the speech to its semantics.

Example

>>> from speechbrain.inference.SLU import EndToEndSLU
>>> tmpdir = getfixture("tmpdir")
>>> slu_model = EndToEndSLU.from_hparams(
...     source="speechbrain/slu-timers-and-such-direct-librispeech-asr",
...     savedir=tmpdir,
... )  
>>> slu_model.decode_file("tests/samples/single-mic/example6.wav") 
"{'intent': 'SimpleMath', 'slots': {'number1': 37.67, 'number2': 75.7, 'op': ' minus '}}"
HPARAMS_NEEDED = ['tokenizer', 'asr_model_source']
MODULES_NEEDED = ['slu_enc', 'beam_searcher']
decode_file(path, **kwargs)[source]

Maps the given audio file to a string representing the semantic dictionary for the utterance.

Parameters:

path (str) – Path to audio file to decode.

Returns:

The predicted semantics.

Return type:

str

encode_batch(wavs, wav_lens)[source]

Encodes the input audio into a sequence of hidden states

Parameters:
  • wavs (torch.Tensor) – Batch of waveforms [batch, time, channels] or [batch, time] depending on the model.

  • wav_lens (torch.Tensor) – Lengths of the waveforms relative to the longest one in the batch, tensor of shape [batch]. The longest one should have relative length 1.0 and others len(waveform) / max_length. Used for ignoring padding.

Returns:

The encoded batch

Return type:

torch.Tensor

decode_batch(wavs, wav_lens)[source]

Maps the input audio to its semantics

Parameters:
  • wavs (torch.Tensor) – Batch of waveforms [batch, time, channels] or [batch, time] depending on the model.

  • wav_lens (torch.Tensor) – Lengths of the waveforms relative to the longest one in the batch, tensor of shape [batch]. The longest one should have relative length 1.0 and others len(waveform) / max_length. Used for ignoring padding.

Returns:

  • list – Each waveform in the batch decoded.

  • tensor – Each predicted token id.

forward(wavs, wav_lens)[source]

Runs full decoding - note: no gradients through decoding

training: bool