speechbrain.integrations.nlp.flair_embeddings moduleο
Wrappers for Flair embedding classes
Authors * Sylvain de Langen 2024
Summaryο
Classes:
Simple wrapper for generic Flair embeddings. |
Referenceο
- class speechbrain.integrations.nlp.flair_embeddings.FlairEmbeddings(embeddings: flair.embeddings.Embeddings)[source]ο
Bases:
objectSimple wrapper for generic Flair embeddings.
- Parameters:
embeddings (Embeddings) β The Flair embeddings object. If you do not have one initialized, use
from_hf()instead.
Example
>>> from speechbrain.utils.metric_stats import EmbeddingErrorRateSimilarity >>> from speechbrain.utils.metric_stats import WeightedErrorRateStats >>> from speechbrain.utils.metric_stats import ErrorRateStats >>> ember = FlairEmbeddings.from_hf( ... embeddings_class=flair.embeddings.TransformerWordEmbeddings, ... source="google-bert/bert-base-uncased", ... ) >>> ember_metric = EmbeddingErrorRateSimilarity( ... embedding_function=lambda x: FlairEmbeddings.embed_word(ember, x), ... low_similarity_weight=1.0, ... high_similarity_weight=0.1, ... threshold=0.4, ... ) >>> weighted_wer = WeightedErrorRateStats( ... base_stats=ErrorRateStats(), ... cost_function=ember_metric, ... weight_name="ember", ... ) >>> weighted_wer.base_stats.append(["id"], ["hi friend"], ["hi buddy"]) >>> weighted_wer.summarize() {'ember_wer': 16.6..., 'ember_insertions': 1.0, 'ember_substitutions': 0.5, 'ember_deletions': 0.0, 'ember_num_edits': 1.5}
- static from_hf(embeddings_class, source, *args, **kwargs) FlairEmbeddings[source]ο
Fetches and load flair embeddings.
- Parameters:
embeddings_class (class) β The class to use to initialize the model, e.g.
FastTextEmbeddings.source (str) β The location of the model (a directory or HF repo, for instance).
*args β Extra positional arguments to pass to the flair class constructor
**kwargs β Extra keyword arguments to pass to the flair class constructor
- Return type:
- __call__(inputs: List[str] | List[List[str]], pad_tensor: Tensor = tensor([0.])) Tensor[source]ο
Extract embeddings for a batch of sentences.
- Parameters:
inputs (list of sentences (str or list of tokens)) β Sentences to embed, in the form of batches of lists of tokens (list of str) or a str. In the case of token lists, tokens do not need to be already tokenized for this specific sequence tagger. However, a token may be considered as a single word. Similarly, out-of-vocabulary handling depends on the underlying embedding class.
pad_tensor (torch.Tensor, optional) β What embedding tensor (of shape
[], living on the same device as the embeddings to insert as padding.
- Returns:
Batch of shape
[len(inputs), max_len, embed_size]- Return type: