speechbrain.integrations.audio_tokenizers.speechtokenizer_interface module
This lobe enables the integration of pretrained SpeechTokenizer.
- Please, install speechtokenizer:
pip install speechtokenizer
Reference: https://arxiv.org/abs/2308.16692
Transformer from HuggingFace needs to be installed: https://huggingface.co/transformers/installation.html
- Author
Pooneh Mousavi 2023
Summary
Classes:
This lobe enables the integration of HuggingFace and SpeechBrain pretrained SpeechTokenizer. |
Reference
- class speechbrain.integrations.audio_tokenizers.speechtokenizer_interface.SpeechTokenizer(source, save_path, sample_rate=16000)[source]
Bases:
ModuleThis lobe enables the integration of HuggingFace and SpeechBrain pretrained SpeechTokenizer.
Please, install speechtokenizer: pip install speechtokenizer
Source paper: https://arxiv.org/abs/2308.16692
The model can be used as a fixed Discrete feature extractor or can be finetuned. It will download automatically the model from HuggingFace or use a local path.
- Parameters:
Example
>>> import torch >>> inputs = torch.rand([10, 600]) >>> model_hub = "fnlp/SpeechTokenizer" >>> save_path = "savedir" >>> model = SpeechTokenizer(model_hub, save_path) >>> tokens = model.encode(inputs) >>> tokens.shape torch.Size([8, 10, 2]) >>> wav = model.decode(tokens) >>> wav.shape torch.Size([10, 640])
- forward(wav, wav_lens=None)[source]
Takes an input waveform and return its corresponding wav2vec encoding.
- Parameters:
wav (torch.Tensor (signal)) – A batch of audio signals to transform to features.
wav_lens (torch.Tensor) – The relative length of the wav given in SpeechBrain format.
- Returns:
tokens – A (N_q, Batch x Seq) tensor of audio tokens
- Return type:
- encode(wav, wav_lens=None)[source]
Takes an input waveform and return its corresponding wav2vec encoding.
- Parameters:
wav (torch.Tensor (signal)) – A batch of audio signals to transform to features.
wav_lens (torch.Tensor) – The relative length of the wav given in SpeechBrain format.
- Returns:
tokens – A (N_q, Batch x Seq) tensor of audio tokens
- Return type:
- decode(codes)[source]
Takes an input waveform and return its corresponding wav2vec encoding.
- Parameters:
codes (torch.Tensor) – A (N_q, Batch x Seq) tensor of audio tokens
- Returns:
wav – A batch of reconstructed audio signals.
- Return type:
torch.Tensor (signal)