speechbrain.inference.encoders moduleο
Specifies the inference interfaces for speech and audio encoders.
- Authors:
Aku Rouhe 2021
Peter Plantinga 2021
Loren Lugosch 2020
Mirco Ravanelli 2020
Titouan Parcollet 2021
Abdel Heba 2021
Andreas Nautsch 2022, 2023
Pooneh Mousavi 2023
Sylvain de Langen 2023
Adel Moumen 2023
Pradnya Kandarkar 2023
Summaryο
Classes:
A MelSpectrogramEncoder class created for the Zero-Shot Multi-Speaker TTS models. |
|
A ready-to-use waveformEncoder model |
Referenceο
- class speechbrain.inference.encoders.WaveformEncoder(modules=None, hparams=None, run_opts=None, freeze_params=True)[source]ο
Bases:
Pretrained
A ready-to-use waveformEncoder model
It can be used to wrap different embedding models such as SSL ones (wav2vec2) or speaker ones (Xvector) etc. Two functions are available: encode_batch and encode_file. They can be used to obtain the embeddings directly from an audio file or from a batch of audio tensors respectively.
The given YAML must contain the fields specified in the *_NEEDED[] lists.
- Parameters:
Pretrained (See)
Example
>>> from speechbrain.inference.encoders import WaveformEncoder >>> tmpdir = getfixture("tmpdir") >>> ssl_model = WaveformEncoder.from_hparams( ... source="speechbrain/ssl-wav2vec2-base-libri", ... savedir=tmpdir, ... ) >>> ssl_model.encode_file("samples/audio_samples/example_fr.wav")
- MODULES_NEEDED = ['encoder']ο
- encode_batch(wavs, wav_lens)[source]ο
Encodes the input audio into a sequence of hidden states
The waveforms should already be in the modelβs desired format.
- Parameters:
wavs (torch.Tensor) β Batch of waveforms [batch, time, channels] or [batch, time] depending on the model.
wav_lens (torch.Tensor) β Lengths of the waveforms relative to the longest one in the batch, tensor of shape [batch]. The longest one should have relative length 1.0 and others len(waveform) / max_length. Used for ignoring padding.
- Returns:
The encoded batch
- Return type:
torch.Tensor
- class speechbrain.inference.encoders.MelSpectrogramEncoder(modules=None, hparams=None, run_opts=None, freeze_params=True)[source]ο
Bases:
Pretrained
A MelSpectrogramEncoder class created for the Zero-Shot Multi-Speaker TTS models.
This is for speaker encoder models using the PyTorch MelSpectrogram transform for compatibility with the current TTS pipeline.
This class can be used to encode a single waveform, a single mel-spectrogram, or a batch of mel-spectrograms.
- Parameters:
Pretrained (See)
Example
>>> import torchaudio >>> from speechbrain.inference.encoders import MelSpectrogramEncoder >>> # Model is downloaded from the speechbrain HuggingFace repo >>> tmpdir = getfixture("tmpdir") >>> encoder = MelSpectrogramEncoder.from_hparams( ... source="speechbrain/tts-ecapa-voxceleb", ... savedir=tmpdir, ... )
>>> # Compute embedding from a waveform (sample_rate must match the sample rate of the encoder) >>> signal, fs = torchaudio.load("tests/samples/single-mic/example1.wav") >>> spk_emb = encoder.encode_waveform(signal)
>>> # Compute embedding from a mel-spectrogram (sample_rate must match the sample rate of the ecoder) >>> mel_spec = encoder.mel_spectogram(audio=signal) >>> spk_emb = encoder.encode_mel_spectrogram(mel_spec)
>>> # Compute embeddings for a batch of mel-spectrograms >>> spk_embs = encoder.encode_mel_spectrogram_batch(mel_spec)
- MODULES_NEEDED = ['normalizer', 'embedding_model']ο
- dynamic_range_compression(x, C=1, clip_val=1e-05)[source]ο
Dynamic range compression for audio signals
- mel_spectogram(audio)[source]ο
calculates MelSpectrogram for a raw audio signal
- Parameters:
audio (torch.tensor) β input audio signal
- Returns:
mel β Mel-spectrogram
- Return type:
torch.Tensor
- encode_waveform(wav)[source]ο
Encodes a single waveform
- Parameters:
wav (torch.Tensor) β waveform
- Returns:
encoder_out β Speaker embedding for the input waveform
- Return type:
torch.Tensor
- encode_mel_spectrogram(mel_spec)[source]ο
Encodes a single mel-spectrograms
- Parameters:
mel_spec (torch.Tensor) β Mel-spectrograms
- Returns:
encoder_out β Speaker embedding for the input mel-spectrogram
- Return type:
torch.Tensor
- encode_mel_spectrogram_batch(mel_specs, lens=None)[source]ο
Encodes a batch of mel-spectrograms
- Parameters:
mel_specs (torch.Tensor) β Mel-spectrograms
lens (torch.Tensor) β Relative lengths of the mel-spectrograms
- Returns:
encoder_out β Speaker embedding for the input mel-spectrogram batch
- Return type:
torch.Tensor