speechbrain.inference.vocoders module
Specifies the inference interfaces for Text-To-Speech (TTS) modules.
- Authors:
Aku Rouhe 2021
Peter Plantinga 2021
Loren Lugosch 2020
Mirco Ravanelli 2020
Titouan Parcollet 2021
Abdel Heba 2021
Andreas Nautsch 2022, 2023
Pooneh Mousavi 2023
Sylvain de Langen 2023
Adel Moumen 2023
Pradnya Kandarkar 2023
Summary
Classes:
A ready-to-use inference wrapper for DiffWave as vocoder. The wrapper allows to perform generative tasks: locally-conditional generation: mel_spec -> waveform :param hparams: Hyperparameters (from HyperPyYAML). |
|
A ready-to-use wrapper for HiFiGAN (mel_spec -> waveform). |
|
A ready-to-use wrapper for Unit HiFiGAN (discrete units -> waveform). |
Reference
- class speechbrain.inference.vocoders.HIFIGAN(*args, **kwargs)[source]
Bases:
Pretrained
A ready-to-use wrapper for HiFiGAN (mel_spec -> waveform). :param hparams: Hyperparameters (from HyperPyYAML)
Example
>>> tmpdir_vocoder = getfixture('tmpdir') / "vocoder" >>> hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir=tmpdir_vocoder) >>> mel_specs = torch.rand(2, 80,298) >>> waveforms = hifi_gan.decode_batch(mel_specs) >>> # You can use the vocoder coupled with a TTS system >>> # Initialize TTS (tacotron2) >>> tmpdir_tts = getfixture('tmpdir') / "tts" >>> from speechbrain.inference.TTS import Tacotron2 >>> tacotron2 = Tacotron2.from_hparams(source="speechbrain/tts-tacotron2-ljspeech", savedir=tmpdir_tts) >>> # Running the TTS >>> mel_output, mel_length, alignment = tacotron2.encode_text("Mary had a little lamb") >>> # Running Vocoder (spectrogram-to-waveform) >>> waveforms = hifi_gan.decode_batch(mel_output)
- HPARAMS_NEEDED = ['generator']
- decode_batch(spectrogram, mel_lens=None, hop_len=None)[source]
Computes waveforms from a batch of mel-spectrograms :param spectrogram: Batch of mel-spectrograms [batch, mels, time] :type spectrogram: torch.Tensor :param mel_lens: A list of lengths of mel-spectrograms for the batch
Can be obtained from the output of Tacotron/FastSpeech
- Parameters:
hop_len (int) – hop length used for mel-spectrogram extraction should be the same value as in the .yaml file
- Returns:
waveforms – Batch of mel-waveforms [batch, 1, time]
- Return type:
- mask_noise(waveform, mel_lens, hop_len)[source]
Mask the noise caused by padding during batch inference :param wavform: Batch of generated waveforms [batch, 1, time] :type wavform: torch.tensor :param mel_lens: A list of lengths of mel-spectrograms for the batch
Can be obtained from the output of Tacotron/FastSpeech
- Parameters:
hop_len (int) – hop length used for mel-spectrogram extraction same value as in the .yaml file
- Returns:
waveform – Batch of waveforms without padded noise [batch, 1, time]
- Return type:
torch.tensor
- decode_spectrogram(spectrogram)[source]
Computes waveforms from a single mel-spectrogram :param spectrogram: mel-spectrogram [mels, time] :type spectrogram: torch.Tensor
- Returns:
waveform (torch.Tensor) – waveform [1, time]
audio can be saved by
>>> import torchaudio
>>> waveform = torch.rand(1, 666666)
>>> sample_rate = 22050
>>> torchaudio.save(str(getfixture(‘tmpdir’) / “test.wav”), waveform, sample_rate)
- class speechbrain.inference.vocoders.DiffWaveVocoder(*args, **kwargs)[source]
Bases:
Pretrained
A ready-to-use inference wrapper for DiffWave as vocoder. The wrapper allows to perform generative tasks:
locally-conditional generation: mel_spec -> waveform
- Parameters:
hparams – Hyperparameters (from HyperPyYAML)
- HPARAMS_NEEDED = ['diffusion']
- decode_batch(mel, hop_len, mel_lens=None, fast_sampling=False, fast_sampling_noise_schedule=None)[source]
Generate waveforms from spectrograms :param mel: spectrogram [batch, mels, time] :type mel: torch.tensor :param hop_len: Hop length during mel-spectrogram extraction
Should be the same value as in the .yaml file Used to determine the output wave length Also used to mask the noise for vocoding task
- Parameters:
mel_lens (torch.tensor) – Used to mask the noise caused by padding A list of lengths of mel-spectrograms for the batch Can be obtained from the output of Tacotron/FastSpeech
fast_sampling (bool) – whether to do fast sampling
fast_sampling_noise_schedule (list) – the noise schedules used for fast sampling
- Returns:
waveforms – Batch of mel-waveforms [batch, 1, time]
- Return type:
torch.tensor
- mask_noise(waveform, mel_lens, hop_len)[source]
Mask the noise caused by padding during batch inference :param wavform: Batch of generated waveforms [batch, 1, time] :type wavform: torch.tensor :param mel_lens: A list of lengths of mel-spectrograms for the batch
Can be obtained from the output of Tacotron/FastSpeech
- Parameters:
hop_len (int) – hop length used for mel-spectrogram extraction same value as in the .yaml file
- Returns:
waveform – Batch of waveforms without padded noise [batch, 1, time]
- Return type:
torch.tensor
- decode_spectrogram(spectrogram, hop_len, fast_sampling=False, fast_sampling_noise_schedule=None)[source]
Computes waveforms from a single mel-spectrogram :param spectrogram: mel-spectrogram [mels, time] :type spectrogram: torch.tensor :param hop_len: hop length used for mel-spectrogram extraction
same value as in the .yaml file
- Parameters:
- Returns:
waveform (torch.tensor) – waveform [1, time]
audio can be saved by
>>> import torchaudio
>>> waveform = torch.rand(1, 666666)
>>> sample_rate = 22050
>>> torchaudio.save(str(getfixture(‘tmpdir’) / “test.wav”), waveform, sample_rate)
- class speechbrain.inference.vocoders.UnitHIFIGAN(*args, **kwargs)[source]
Bases:
Pretrained
A ready-to-use wrapper for Unit HiFiGAN (discrete units -> waveform). :param hparams: Hyperparameters (from HyperPyYAML)
Example
>>> tmpdir_vocoder = getfixture('tmpdir') / "vocoder" >>> hifi_gan = UnitHIFIGAN.from_hparams(source="speechbrain/tts-hifigan-unit-hubert-l6-k100-ljspeech", savedir=tmpdir_vocoder) >>> codes = torch.randint(0, 99, (100,)) >>> waveform = hifi_gan.decode_unit(codes)
- HPARAMS_NEEDED = ['generator']
- decode_batch(units)[source]
Computes waveforms from a batch of discrete units :param units: Batch of discrete units [batch, codes] :type units: torch.tensor
- Returns:
waveforms – Batch of mel-waveforms [batch, 1, time]
- Return type:
torch.tensor