speechbrain.lobes.models.flair.embeddings module

Wrappers for Flair embedding classes

Authors * Sylvain de Langen 2024

Summary

Classes:

FlairEmbeddings

Simple wrapper for generic Flair embeddings.

Reference

class speechbrain.lobes.models.flair.embeddings.FlairEmbeddings(embeddings: flair.embeddings.Embeddings)[source]

Bases: object

Simple wrapper for generic Flair embeddings.

Parameters:

embeddings (Embeddings) – The Flair embeddings object. If you do not have one initialized, use from_hf() instead.

static from_hf(embeddings_class, source, save_path='./model_checkpoints', filename='model.bin', *args, **kwargs) FlairEmbeddings[source]

Fetches and load flair embeddings according to the speechbrain.utils.fetching.fetch() semantics. Embedding files will be saved into a unique subdirectory in save_path.

Parameters:
  • embeddings_class (class) – The class to use to initialize the model, e.g. FastTextEmbeddings.

  • source (str) – The location of the model (a directory or HF repo, for instance).

  • save_path (str, optional) – The saving location for the model (i.e. the root for the download or symlink location).

  • filename (str, optional) – The filename of the model. The default is the usual filename for this kind of model.

  • *args – Extra positional arguments to pass to the flair class constructor

  • **kwargs – Extra keyword arguments to pass to the flair class constructor

Return type:

FlairEmbeddings

__call__(inputs: List[str] | List[List[str]], pad_tensor: Tensor = tensor([0.])) Tensor[source]

Extract embeddings for a batch of sentences.

Parameters:
  • inputs (list of sentences (str or list of tokens)) – Sentences to embed, in the form of batches of lists of tokens (list of str) or a str. In the case of token lists, tokens do not need to be already tokenized for this specific sequence tagger. However, a token may be considered as a single word. Similarly, out-of-vocabulary handling depends on the underlying embedding class.

  • pad_tensor (torch.Tensor, optional) – What embedding tensor (of shape [], living on the same device as the embeddings to insert as padding.

Returns:

Batch of shape [len(inputs), max_len, embed_size]

Return type:

torch.Tensor

embed_word(word: str) Tensor[source]

Embeds a single word.

Parameters:

word (str) – Word to embed. Out-of-vocabulary handling depends on the underlying embedding class.

Returns:

Embedding for a single word, of shape [embed_size]

Return type:

torch.Tensor