speechbrain.lobes.models.huggingface_transformers.huggingface module
This lobe is the interface for huggingface transformers models It enables loading config and model via AutoConfig & AutoModel.
Transformer from HuggingFace needs to be installed: https://huggingface.co/transformers/installation.html
- Authors
Titouan Parcollet 2021, 2022, 2023
Mirco Ravanelli 2021
Boumadane Abdelmoumene 2021
Ju-Chieh Chou 2021
Artem Ploujnikov 2021, 2022
Abdel Heba 2021
Aku Rouhe 2022
Arseniy Gorin 2022
Ali Safaya 2022
Benoit Wang 2022
Adel Moumen 2022, 2023
Andreas Nautsch 2022, 2023
Luca Della Libera 2022
Heitor Guimarães 2022
Ha Nguyen 2023
Summary
Classes:
This lobe provides an interface for integrating any HuggingFace transformer model within SpeechBrain. |
Functions:
This method generates the padding masks. |
Reference
- class speechbrain.lobes.models.huggingface_transformers.huggingface.HFTransformersInterface(source, save_path='', for_pretraining=False, with_lm_head=False, with_casual_lm=False, seq2seqlm=False, quantization_config=None, freeze=False, cache_dir='pretrained_models', device=None, **kwargs)[source]
Bases:
Module
This lobe provides an interface for integrating any HuggingFace transformer model within SpeechBrain.
We use AutoClasses for loading any model from the hub and its necessary components. For example, we build Wav2Vec2 class which inherits HFTransformersInterface for working with HuggingFace’s wav2vec models. While Wav2Vec2 can enjoy some already built features like modeling loading, pretrained weights loading, all weights freezing, feature_extractor loading, etc. Users are expected to override the essential forward() function to fit their specific needs. Depending on the HuggingFace transformer model in question, one can also modify the state_dict by overwriting the _modify_state_dict() method, or adapting their config by modifying override_config() method, etc. See: https://huggingface.co/docs/transformers/model_doc/auto https://huggingface.co/docs/transformers/autoclass_tutorial
- Parameters:
source (str) – HuggingFace hub name: e.g “facebook/wav2vec2-large-lv60”
save_path (str) – save directory of the downloaded model.
for_pretraining (bool (default: False)) – If True, build the model for pretraining
with_lm_head (bool (default: False)) – If True, build the model with lm_head
with_casual_lm (bool (default: False)) – If True, build casual lm model
seq2seqlm (bool (default: False)) – If True, build a sequence-to-sequence model with lm_head
quantization_config (dict (default: None)) – Quantization config, extremely useful for deadling with LLM
freeze (bool (default: True)) – If True, the model is frozen. If False, the model will be trained alongside with the rest of the pipeline.
cache_dir (str or Path (default: None)) – Location of HuggingFace cache for storing pre-trained models, to which symlinks are created.
device (any, optional) – Device to migrate the model to.
**kwargs – Extra keyword arguments passed to the
from_pretrained
function.
Example
>>> model_hub = "facebook/wav2vec2-base-960h" >>> save_path = "tmp" >>> model = HFTransformersInterface(model_hub, save_path=save_path)
- decode(**kwargs)[source]
Might be useful for models like mbart, which can exploit SB’s beamsearch for inference Users should modify this function according to their own tasks.
- encode(**kwargs)[source]
Custom encoding for inference Users should modify this function according to their own tasks.
- freeze_model(model)[source]
Freezes parameters of a model. This should be overridden too, depending on users’ needs, for example, adapters use.
- Parameters:
model (from AutoModel.from_config) – Valid HuggingFace transformers model object.
- override_config(config)[source]
Users should modify this function according to their own tasks.
- Parameters:
config (HuggingFace config object) – The original config.
- Returns:
config – Overridden config.
- Return type:
HuggingFace config object
- speechbrain.lobes.models.huggingface_transformers.huggingface.make_padding_masks(src, wav_len=None, pad_idx=0)[source]
This method generates the padding masks.
- Parameters:
src (tensor) – The sequence to the encoder (required).
wav_len (tensor) – The relative length of the wav given in SpeechBrain format.
pad_idx (int) – The index for <pad> token (default=0).
- Returns:
src_key_padding_mask – The padding mask.
- Return type:
tensor