speechbrain.lobes.models.huggingface_transformers.huggingface module

This lobe is the interface for huggingface transformers models It enables loading config and model via AutoConfig & AutoModel.

Transformer from HuggingFace needs to be installed: https://huggingface.co/transformers/installation.html

Authors
  • Titouan Parcollet 2021, 2022, 2023

  • Mirco Ravanelli 2021

  • Boumadane Abdelmoumene 2021

  • Ju-Chieh Chou 2021

  • Artem Ploujnikov 2021, 2022

  • Abdel Heba 2021

  • Aku Rouhe 2022

  • Arseniy Gorin 2022

  • Ali Safaya 2022

  • Benoit Wang 2022

  • Adel Moumen 2022, 2023

  • Andreas Nautsch 2022, 2023

  • Luca Della Libera 2022

  • Heitor Guimarães 2022

  • Ha Nguyen 2023

Summary

Classes:

HFTransformersInterface

This lobe provides an interface for integrating any HuggingFace transformer model within SpeechBrain.

Functions:

make_padding_masks

This method generates the padding masks.

Reference

class speechbrain.lobes.models.huggingface_transformers.huggingface.HFTransformersInterface(source, save_path='', for_pretraining=False, with_lm_head=False, with_casual_lm=False, seq2seqlm=False, quantization_config=None, freeze=False, cache_dir='pretrained_models', **kwarg)[source]

Bases: Module

This lobe provides an interface for integrating any HuggingFace transformer model within SpeechBrain.

We use AutoClasses for loading any model from the hub and its necessary components. For example, we build Wav2Vec2 class which inherits HFTransformersInterface for working with HuggingFace’s wav2vec models. While Wav2Vec2 can enjoy some already built features like modeling loading, pretrained weights loading, all weights freezing, feature_extractor loading, etc. Users are expected to override the essential forward() function to fit their specific needs. Depending on the HuggingFace transformer model in question, one can also modify the state_dict by overwriting the _modify_state_dict() method, or adapting their config by modifying override_config() method, etc. See: https://huggingface.co/docs/transformers/model_doc/auto https://huggingface.co/docs/transformers/autoclass_tutorial

Parameters:
  • source (str) – HuggingFace hub name: e.g “facebook/wav2vec2-large-lv60”

  • save_path (str) – save directory of the downloaded model.

  • for_pretraining (bool (default: False)) – If True, build the model for pretraining

  • with_lm_head (bool (default: False)) – If True, build the model with lm_head

  • with_casual_lm (bool (default: False)) – If True, build casual lm model

  • seq2seqlm (bool (default: False)) – If True, build a sequence-to-sequence model with lm_head

  • quantization_config (dict (default: None)) – Quantization config, extremely useful for deadling with LLM

  • freeze (bool (default: True)) – If True, the model is frozen. If False, the model will be trained alongside with the rest of the pipeline.

  • cache_dir (str or Path (default: None)) – Location of HuggingFace cache for storing pre-trained models, to which symlinks are created.

Example

>>> model_hub = "facebook/wav2vec2-base-960h"
>>> save_path = "tmp"
>>> model = HFTransformersInterface(model_hub, save_path=save_path)
forward(**kwargs)[source]

Users should modify this function according to their own tasks.

forward_encoder(**kwargs)[source]

Users should modify this function according to their own tasks.

forward_decoder(**kwargs)[source]

Users should modify this function according to their own tasks.

decode(**kwargs)[source]

Might be useful for models like mbart, which can exploit SB’s beamsearch for inference Users should modify this function according to their own tasks.

encode(**kwargs)[source]

Customed encoding for inference Users should modify this function according to their own tasks.

freeze_model(model)[source]

Freezes parameters of a model. This should be overrided too, depending on users’ needs, for example, adapters use.

Parameters:

model (from AutoModel.from_config) – Valid HuggingFace transformers model object.

training: bool
override_config(config)[source]

Users should modify this function according to their own tasks.

Parameters:

config (HuggingFace config object) – The orginal config.

Returns:

config – Overridden config.

Return type:

HuggingFace config object

load_feature_extractor(source, cache_dir, **kwarg)[source]

Load model’s feature_extractor from the hub.

Parameters:
  • source (str) – HuggingFace hub name: e.g “facebook/wav2vec2-large-lv60”

  • cache_dir (str) – Path (dir) in which a downloaded pretrained model configuration should be cached.

  • **kwarg – Keyword arguments to pass to the AutoFeatureExtractor.from_pretrained() method.

load_tokenizer(source, **kwarg)[source]

Load model’s tokenizer from the hub.

Parameters:
  • source (str) – HuggingFace hub name: e.g “facebook/wav2vec2-large-lv60”

  • **kwarg – Keyword arguments to pass to the AutoFeatureExtractor.from_pretrained() method.

speechbrain.lobes.models.huggingface_transformers.huggingface.make_padding_masks(src, wav_len=None, pad_idx=0)[source]

This method generates the padding masks.

Parameters:
  • src (tensor) – The sequence to the encoder (required).

  • wav_len (tensor) – The relative length of the wav given in SpeechBrain format.

  • pad_idx (int) – The index for <pad> token (default=0).

Returns:

src_key_padding_mask – The padding mask.

Return type:

tensor