speechbrain.lobes.models.huggingface_transformers.huggingface module

This lobe is the interface for huggingface transformers models It enables loading config and model via AutoConfig & AutoModel.

Transformer from HuggingFace needs to be installed: https://huggingface.co/transformers/installation.html

Authors

Titouan Parcollet 2021, 2022, 2023
Mirco Ravanelli 2021
Boumadane Abdelmoumene 2021
Ju-Chieh Chou 2021
Artem Ploujnikov 2021, 2022
Abdel Heba 2021
Aku Rouhe 2022
Arseniy Gorin 2022
Ali Safaya 2022
Benoit Wang 2022
Adel Moumen 2022, 2023
Andreas Nautsch 2022, 2023
Luca Della Libera 2022
Heitor Guimarães 2022
Ha Nguyen 2023

Summary

Classes:

HFTransformersInterface

This lobe provides an interface for integrating any HuggingFace transformer model within SpeechBrain.

Functions:

make_padding_masks

This method generates the padding masks.

Reference

class speechbrain.lobes.models.huggingface_transformers.huggingface.HFTransformersInterface(source, save_path='', for_pretraining=False, with_lm_head=False, with_casual_lm=False, seq2seqlm=False, quantization_config=None, freeze=False, cache_dir='pretrained_models', **kwarg)[source]

Bases: Module

This lobe provides an interface for integrating any HuggingFace transformer model within SpeechBrain.

We use AutoClasses for loading any model from the hub and its necessary components. For example, we build Wav2Vec2 class which inherits HFTransformersInterface for working with HuggingFace’s wav2vec models. While Wav2Vec2 can enjoy some already built features like modeling loading, pretrained weights loading, all weights freezing, feature_extractor loading, etc. Users are expected to override the essential forward() function to fit their specific needs. Depending on the HuggingFace transformer model in question, one can also modify the state_dict by overwriting the _modify_state_dict() method, or adapting their config by modifying override_config() method, etc. See: https://huggingface.co/docs/transformers/model_doc/auto https://huggingface.co/docs/transformers/autoclass_tutorial

Parameters:

source (str) – HuggingFace hub name: e.g “facebook/wav2vec2-large-lv60”
save_path (str) – save directory of the downloaded model.
for_pretraining (bool (default: False)) – If True, build the model for pretraining
with_lm_head (bool (default: False)) – If True, build the model with lm_head
with_casual_lm (bool (default: False)) – If True, build casual lm model
seq2seqlm (bool (default: False)) – If True, build a sequence-to-sequence model with lm_head
quantization_config (dict (default: None)) – Quantization config, extremely useful for deadling with LLM
freeze (bool (default: True)) – If True, the model is frozen. If False, the model will be trained alongside with the rest of the pipeline.
cache_dir (str or Path (default: None)) – Location of HuggingFace cache for storing pre-trained models, to which symlinks are created.

Example

>>> model_hub = "facebook/wav2vec2-base-960h"
>>> save_path = "tmp"
>>> model = HFTransformersInterface(model_hub, save_path=save_path)

forward(**kwargs)[source]: Users should modify this function according to their own tasks.

forward_encoder(**kwargs)[source]: Users should modify this function according to their own tasks.

forward_decoder(**kwargs)[source]: Users should modify this function according to their own tasks.

decode(**kwargs)[source]: Might be useful for models like mbart, which can exploit SB’s beamsearch for inference Users should modify this function according to their own tasks.

encode(**kwargs)[source]: Customed encoding for inference Users should modify this function according to their own tasks.

freeze_model(model)[source]

Freezes parameters of a model. This should be overrided too, depending on users’ needs, for example, adapters use.

Parameters:: model (from AutoModel.from_config) – Valid HuggingFace transformers model object.

training: bool

override_config(config)[source]

Users should modify this function according to their own tasks.

Parameters:: config (HuggingFace config object) – The orginal config.
Returns:: config – Overridden config.
Return type:: HuggingFace config object

load_feature_extractor(source, cache_dir, **kwarg)[source]

Load model’s feature_extractor from the hub.

Parameters:

source (str) – HuggingFace hub name: e.g “facebook/wav2vec2-large-lv60”
cache_dir (str) – Path (dir) in which a downloaded pretrained model configuration should be cached.
**kwarg – Keyword arguments to pass to the AutoFeatureExtractor.from_pretrained() method.

load_tokenizer(source, **kwarg)[source]

Load model’s tokenizer from the hub.

Parameters:

source (str) – HuggingFace hub name: e.g “facebook/wav2vec2-large-lv60”
**kwarg – Keyword arguments to pass to the AutoFeatureExtractor.from_pretrained() method.

speechbrain.lobes.models.huggingface_transformers.huggingface.make_padding_masks(src, wav_len=None, pad_idx=0)[source]

This method generates the padding masks.

Parameters:

src (tensor) – The sequence to the encoder (required).
wav_len (tensor) – The relative length of the wav given in SpeechBrain format.
pad_idx (int) – The index for <pad> token (default=0).

Returns:

src_key_padding_mask – The padding mask.

Return type:

tensor