speechbrain.utils.bertscore moduleο
Provides a metrics class for the BERTscore metric.
Authors * Sylvain de Langen 2024
Summaryο
Classes:
Computes BERTScore with a provided HuggingFace Transformers text encoder, using the method described in the paper BERTScore: Evaluating Text Generation with BERT. |
Functions:
Returns a token mask with special tokens masked. |
|
Returns token weights for use with the BERTScore metric. |
Referenceο
- class speechbrain.utils.bertscore.BERTScoreStats(lm: TextEncoder, batch_size: int = 64, use_idf: bool = True, sentence_level_averaging: bool = True, allow_matching_special_tokens: bool = False)[source]ο
Bases:
MetricStats
Computes BERTScore with a provided HuggingFace Transformers text encoder, using the method described in the paper BERTScore: Evaluating Text Generation with BERT.
BERTScore operates over contextualized tokens (e.g. the output of BERT, but many other models would work). Since cosine similarities are used, the output range would be between
-1
and1
. See the linked resources for more details.Special tokens (as queried from the tokenizer) are entirely ignored.
Authorsβ reference implementation of the metric can be found here. The linked page extensively describes the approach and compares how the BERTScore relates to human evaluation with many different models.
Warning
Out of the box, this implementation may not strictly match the results of the reference implementation. Please read the argument documentation to understand the differences.
- Parameters:
lm (speechbrain.lobes.models.huggingface_transformers.TextEncoder) β HF Transformers tokenizer and text encoder wrapper to use as a LM.
batch_size (int, optional) β How many pairs of utterances should be considered at once. Higher is faster but may result in OOM.
use_idf (bool, optional) β If enabled (default), tokens in the reference are weighted by Inverse Document Frequency, which allows to weight down the impact of common words that may carry less information. Every sentence appended is considered a document in the IDF calculation.
sentence_level_averaging (bool, optional) β When
True
, the final recall/precision metrics will be the average of recall/precision for each tested sentence, rather of each tested token, e.g. a very long sentence will weigh as much as a very short sentence in the final metrics. The default isTrue
, which matches the reference implementation.allow_matching_special_tokens (bool, optional) β When
True
, non-special tokens may match against special tokens during greedy matching (e.g.[CLS]
/[SEP]
). Batch size must be 1 due to padding handling. The default isFalse
, which is different behavior from the reference implementation (see bert_score#180).
- speechbrain.utils.bertscore.get_bert_token_mask(tokenizer) BoolTensor [source]ο
Returns a token mask with special tokens masked.
- Parameters:
tokenizer (transformers.PreTrainedTokenizer) β HuggingFace tokenizer for the BERT model.
- Returns:
A mask tensor that can be indexed by token ID (of shape
[vocab_size]
).- Return type:
torch.BoolTensor
- speechbrain.utils.bertscore.get_bertscore_token_weights(tokenizer, corpus: Iterable[str] | None = None) Tensor [source]ο
Returns token weights for use with the BERTScore metric. When specifying
corpus
, the weights are the Inverse Document Frequency (IDF) of each token, extracted from thecorpus
.The IDF formula is adapted from the BERTScore paper, where words missing from the reference corpus are weighted with
+1
smoothing.- Parameters:
tokenizer (transformers.PreTrainedTokenizer) β HuggingFace tokenizer for the BERT model.
corpus (Iterable[str], optional) β Iterable corpus to compute the IDF from. Each iterated value is considered a document in the corpus in the IDF calculation. If omitted, no IDF weighting is done.
- Returns:
A floating-point tensor that can be indexed by token ID, of shape
[vocab_size]
, where each entry is by how much the impact of a given token should be multiplied.- Return type:
torch.Tensor