speechbrain.utils.semdist module
Provides a metrics class for the SemDist metric.
Authors * Sylvain de Langen 2024
Summary
Classes:
Base class to implement the SemDist metric, for the variants that estimate a single cosine similarity per pair of target and predicted texts. |
|
Computes the SemDist metric with a provided HuggingFace Transformers text encoder. |
Reference
- class speechbrain.utils.semdist.BaseSemDistStats(embed_function: Callable[[List[str]], Tensor], scale: float = 1000.0, batch_size: int = 64)[source]
Bases:
MetricStats
Base class to implement the SemDist metric, for the variants that estimate a single cosine similarity per pair of target and predicted texts. The SemDist metrics are described by the paper Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric.
- Parameters:
embed_function (Callable[[List[str]], torch.Tensor]) – Given a list of sentences, return their summarized embedding using the method of your choice (e.g. mean pooling)
scale (float, optional) – The
α
scale applied to the cosine similarity result for clarity. The default is1000
, in order to match the authors’ recommendation.batch_size (int, optional) – How many pairs of utterances should be considered at once. Higher is faster but may result in OOM.
- summarize(field=None)[source]
Summarize the SemDist metric scores. Performs the actual embedding function call and SemDist calculation.
Full set of fields: -
semdist
: The average SemDist over all utterances, multiplied bythe scale optionally specified at initialization.
Additionally, a
scores
list is populated by this function for each pair of sentences. Each entry of that list is a dict, with the fields: -key
: the ID of the utterance. -semdist
: The SemDist of the utterance, multiplied by the scale.- Parameters:
field (str, optional) – The field to return, if you are only interested in one of them. If specified, a single
float
is returned, otherwise, a dict is.- Returns:
dict from str to float, if
field is None
– A dictionary of the fields documented above.float, if
field is not None
– The single field selected byfield
.
- class speechbrain.utils.semdist.SemDistStats(lm, method: Literal['meanpool', 'cls'] = 'meanpool', *args, **kwargs)[source]
Bases:
BaseSemDistStats
Computes the SemDist metric with a provided HuggingFace Transformers text encoder.
- Parameters:
lm (speechbrain.lobes.models.huggingface_transformers.TextEncoder) – HF Transformers tokenizer and text encoder wrapper to use as a LM.
method ("meanpool" or "cls") –
"meanpool"
(default): Computes the mean of all contextualized embeddings, excluding padding tokens."cls"
: Exclusively uses the first contextualized embedding, which with BERT-like tokenizers is the[CLS]
token, which is typically intended to capture classification information.
*args – Extra positional arguments passed to the base constructor.
**kwargs – Extra keyword arguments passed to the base constructor.