speechbrain.integrations.nlp.flair_embeddings module

Wrappers for Flair embedding classes

Authors * Sylvain de Langen 2024

Summary

Classes:

FlairEmbeddings

Simple wrapper for generic Flair embeddings.

Reference

class speechbrain.integrations.nlp.flair_embeddings.FlairEmbeddings(embeddings: flair.embeddings.Embeddings)[source]

Bases: object

Simple wrapper for generic Flair embeddings.

Parameters:: embeddings (Embeddings) – The Flair embeddings object. If you do not have one initialized, use from_hf() instead.

Example

>>> from speechbrain.utils.metric_stats import EmbeddingErrorRateSimilarity
>>> from speechbrain.utils.metric_stats import WeightedErrorRateStats
>>> from speechbrain.utils.metric_stats import ErrorRateStats
>>> ember = FlairEmbeddings.from_hf(
...     embeddings_class=flair.embeddings.TransformerWordEmbeddings,
...     source="google-bert/bert-base-uncased",
... )
>>> ember_metric = EmbeddingErrorRateSimilarity(
...     embedding_function=lambda x: FlairEmbeddings.embed_word(ember, x),
...     low_similarity_weight=1.0,
...     high_similarity_weight=0.1,
...     threshold=0.4,
... )
>>> weighted_wer = WeightedErrorRateStats(
...     base_stats=ErrorRateStats(),
...     cost_function=ember_metric,
...     weight_name="ember",
... )
>>> weighted_wer.base_stats.append(["id"], ["hi friend"], ["hi buddy"])
>>> weighted_wer.summarize()
{'ember_wer': 16.6..., 'ember_insertions': 1.0, 'ember_substitutions': 0.5, 'ember_deletions': 0.0, 'ember_num_edits': 1.5}

static from_hf(embeddings_class, source, *args, **kwargs) → FlairEmbeddings[source]

Fetches and load flair embeddings.

Parameters:

embeddings_class (class) – The class to use to initialize the model, e.g. FastTextEmbeddings.
source (str) – The location of the model (a directory or HF repo, for instance).
*args – Extra positional arguments to pass to the flair class constructor
**kwargs – Extra keyword arguments to pass to the flair class constructor

Return type:

FlairEmbeddings

__call__(inputs: List[str] | List[List[str]], pad_tensor: Tensor = tensor([0.])) → Tensor[source]

Extract embeddings for a batch of sentences.

Parameters:

inputs (list of sentences (str or list of tokens)) – Sentences to embed, in the form of batches of lists of tokens (list of str) or a str. In the case of token lists, tokens do not need to be already tokenized for this specific sequence tagger. However, a token may be considered as a single word. Similarly, out-of-vocabulary handling depends on the underlying embedding class.
pad_tensor (torch.Tensor, optional) – What embedding tensor (of shape [], living on the same device as the embeddings to insert as padding.

Returns:

Batch of shape [len(inputs), max_len, embed_size]

Return type:

torch.Tensor

embed_word(word: str) → Tensor[source]

Embeds a single word.

Parameters:: word (str) – Word to embed. Out-of-vocabulary handling depends on the underlying embedding class.
Returns:: Embedding for a single word, of shape [embed_size]
Return type:: torch.Tensor