{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "sb_auto_header", "tags": [ "sb_auto_header" ] }, "source": [ "\n", "\n", "\n", "[\"Open](https://colab.research.google.com/github/speechbrain/speechbrain/blob/develop/docs/tutorials/tasks/asr-metrics.ipynb)\n", "to execute or view/download this notebook on\n", "[GitHub](https://github.com/speechbrain/speechbrain/tree/develop/docs/tutorials/tasks/asr-metrics.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Metrics for Speech Recognition\n", "\n", "Estimating the accuracy of a speech recognition model is not a trivial problem. The Word Error Rate (WER) and Character Error Rate (CER) metrics are standard, but some research has been trying to develop alternatives that better correlate with human evaluation (such as SemDist).\n", "\n", "This tutorial introduces some alternative ASR metrics and their flexible integration into SpeechBrain, which can help you research, use or develop new metrics, with copy&paste-ready hyperparameters.\n", "\n", "SpeechBrain v1.0.1 via [PR #2451](https://github.com/speechbrain/speechbrain/pull/2451) introduced support and tooling for the metrics suggested by [Qualitative Evaluation of Language Model Rescoring in Automatic Speech Recognition](https://www.isca-archive.org/interspeech_2022/roux22_interspeech.pdf). **We recommend that you read this, as some of the metrics won't be explained in detail here.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "# Installing SpeechBrain via pip\n", "BRANCH = 'develop'\n", "!python -m pip install git+https://github.com/speechbrain/speechbrain.git@$BRANCH\n", "%pip install spacy\n", "%pip install flair" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some boilerplate and test data downloading follows..." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from hyperpyyaml import load_hyperpyyaml\n", "from collections import defaultdict" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "!wget https://raw.githubusercontent.com/thibault-roux/hypereval/main/data/Exemple/refhyp.txt -O refhyp.txt" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bonsoir à tous bienvenue c' est bfm story en direct jusqu' à dix neuf heures à la une\tà tous bienvenue c' est bfm story en direct jusqu' à dix neuf heures\t_\n", "de bfm story ce soir la zone euro va t elle encore vivre un été meurtrier l' allemagne première économie européenne pourrait perdre son triple a la situation se détériore en espagne\tbfm story ce soir la zone euro va t elle encore vive été meurtrier allemagne première économie européenne pourrait perdre son triple a la situation se détériore en espagne\t_\n", "pourquoi ces nouvelles tensions nous serons avec un spécialiste de l' espagne et nous serons avec le député socialiste rapporteur du budget en direct de l' assemblée nationale christian eckert\tces nouvelles tensions sont avec un spécialiste de l' espagne et nous serons avec le député socialiste rapporteur du budget de l' assemblée nationale christian eckert\t_\n", "à la une également la syrie et les armes chimiques la russie demande au régime de bachar al assad de ne pas utiliser ces armes\tla une également la syrie et les armes chimiques la russie demande au régime de bachar el assad ne pas utiliser ses armes\t_\n", "de quel arsenal dispose l' armée syrienne\tquelle arsenal dispose l' armée syrienne\t_\n", "quels dégats pourraient provoquer ces armes chimiques\tdégâts pourraient provoquer ses armes chimiques\t_\n", "un spécialiste jean pierre daguzan nous répondra sur le plateau de bfm story et puis\tspécialistes ont bien accusant nous répondra sur le plateau de bfm story puis\t_\n", "après la droite populaire la droite humaniste voici la droite forte deux jeunes pousses de l' ump guillaume peltier et geoffroy didier lancent ce nouveau mouvement pourquoi faire ils sont mes invités ce soir\tla droite populaire la droite humaniste voici la droite forte deux jeunes pousses de l' ump guillaume peltier geoffroy didier migaud pour quoi faire ils sont mes invités ce soir\t_\n", "et puis c(ette) cette fois ci c' est vraiment la fin la fin de france soir liquidé par le tribunal de commerce nous en parlerons avec son tout dernier rédacteur en chef dominique de montvalon\tcette fois ci c' est vraiment la fin à la fin de france soir liquidé par le tribunal de commerce nous en parlerons avec tout dernier rédacteur en chef dominique de montvalon\t_\n", "damien gourlet bonsoir avec vous ce qu' il faut retenir ce soir dans l' actualité l' actualité ce sont encore les incendies en espagne\tdamien gourlet bonsoir olivier avec vous ce qu' il faut retenir ce soir dans l' actualité actualité se sont encore les incendies en espagne\t_\n" ] } ], "source": [ "!head refhyp.txt" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "refs = []\n", "hyps = []\n", "\n", "# some preprocessing for the example file + load uposer mapping to a test file\n", "\n", "def split_norm_text(s: str):\n", " # s = s.replace(\"' \", \"'\")\n", "\n", " if s != \"\":\n", " return s.split(\" \")\n", "\n", " return s\n", "\n", "with open(\"refhyp.txt\") as f:\n", " for refhyp in f.read().splitlines():\n", " if len(refhyp) <= 1:\n", " continue\n", "\n", " refhyp = refhyp.split(\"\\t\")\n", " refs.append(split_norm_text(refhyp[0]))\n", " hyps.append(split_norm_text(refhyp[1]))\n", "\n", "with open(\"uposer.json\", \"w\") as wf:\n", " wf.write(\"\"\"[\n", " [\"ADJ\", \"ADJFP\", \"ADJFS\", \"ADJMP\", \"ADJMS\"],\n", " [\"NUM\", \"CHIF\"],\n", " [\"CCONJ\", \"COCO\", \"COSUB\"],\n", " [\"DET\", \"DETFS\", \"DETMS\", \"DINTFS\", \"DINTMS\"],\n", " [\"X\", \"MOTINC\"],\n", " [\"NOUN\", \"NFP\", \"NFS\", \"NMP\", \"NMS\"],\n", " [\"PRON\", \"PDEMFP\", \"PDEMFS\", \"PDEMMP\", \"PDEMMS\", \"PINDFP\", \"PINDFS\",\n", " \"PINDMP\", \"PINDMS\", \"PPER1S\", \"PPER2S\", \"PPER3FP\", \"PPER3FS\", \"PPER3MP\",\n", " \"PPER3MS\", \"PPOBJFP\", \"PPOBJFS\", \"PPOBJMP\", \"PPOBJMS\", \"PREF\", \"PREFP\",\n", " \"PREFS\", \"PREL\", \"PRELFP\", \"PRELFS\", \"PRELMP\", \"PRELMS\"],\n", " [\"ADP\", \"PREP\"],\n", " [\"VERB\", \"VPPFP\", \"VPPFS\", \"VPPMP\", \"VPPMS\"],\n", " [\"PROPN\", \"XFAMIL\"],\n", " [\"PUNCT\", \"YPFOR\"]\n", "]\n", "\"\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Word Error Rate (WER)\n", "\n", "The usual WER metric, which is derived from the Levenshtein distance between the **words** of the reference and hypothesis (i.e. ground truth and prediction respectively). The output is often presented as a percentage, but it can actually exceed 100%, e.g. if you have a lot of insertions.\n", "\n", "Of course, what WER is achievable is depends _very_ heavily on the dataset, and on the language to an extent. On some easy datasets, it can get as low as 1%, and good models on harder datasets can struggle to reach 15%, or even worse in challenging conditions.\n", "\n", "The WER is defined as the following (where `#` means \"number of\"):\n", "\n", "$\\dfrac{\\#insertions + \\#substitutions + \\#deletions}{\\#refwords}$\n", "\n", "To understand what exactly is an insertion/subtitution/deletion, you should understand the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance), an edit distance. \n", "Roughly speaking, an insertion is a word your model has predicted but does not exist in the reference, a substitution is a word your model has gotten wrong or spelled incorrectly, and a deletion is a word your model has incorrectly omitted.\n", "\n", "A limitation of the WER is that all errors are weighed equally. For example, a typo from \"processing\" to \"procesing\" does not meaningfully alter meaning, but an error from \"car\" to \"scar\" might drastically alter meaning, yet both are considered a single-word and single-character error. This can result in drastic discrepancies between the WER/CER and human evaluation." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "wer_hparams = load_hyperpyyaml(\"\"\"\n", "wer_stats: !new:speechbrain.utils.metric_stats.ErrorRateStats\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'WER': 15.451152223304122,\n", " 'SER': 90.83899394161924,\n", " 'num_edits': 19042,\n", " 'num_scored_tokens': 123240,\n", " 'num_erroneous_sents': 4948,\n", " 'num_scored_sents': 5447,\n", " 'num_absent_sents': 0,\n", " 'num_ref_sents': 5447,\n", " 'insertions': 1868,\n", " 'deletions': 7886,\n", " 'substitutions': 9288,\n", " 'error_rate': 15.451152223304122}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wer_hparams[\"wer_stats\"].clear()\n", "wer_hparams[\"wer_stats\"].append(\n", " ids=list(range(len(refs))),\n", " predict=hyps,\n", " target=refs,\n", ")\n", "wer_hparams[\"wer_stats\"].summarize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Character Error Rate (CER)\n", "\n", "The typical CER measure, for reference. The CER works the same as the WER, but instead operates at character level (not word or token level). \n", "Ultimately, the CER penalizes various errors differently. Small typos (e.g. missed accents) would result in a full substitution error with the WER, but only result in one character substitution error with the CER. This isn't necessarily an upside since single-character errors can still alter meaning.\n", "\n", "This is slower to run as the edit distance needs to be computed over a comparatively much longer sequence." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "cer_hparams = load_hyperpyyaml(\"\"\"\n", "cer_stats: !new:speechbrain.utils.metric_stats.ErrorRateStats\n", " split_tokens: True\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'WER': 8.728781317403753,\n", " 'SER': 90.83899394161924,\n", " 'num_edits': 57587,\n", " 'num_scored_tokens': 659737,\n", " 'num_erroneous_sents': 4948,\n", " 'num_scored_sents': 5447,\n", " 'num_absent_sents': 0,\n", " 'num_ref_sents': 5447,\n", " 'insertions': 10426,\n", " 'deletions': 36910,\n", " 'substitutions': 10251,\n", " 'error_rate': 8.728781317403753}" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cer_hparams[\"cer_stats\"].clear()\n", "cer_hparams[\"cer_stats\"].append(\n", " ids=list(range(len(refs))),\n", " predict=hyps,\n", " target=refs,\n", ")\n", "cer_hparams[\"cer_stats\"].summarize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part-of-speech Error Rate (POSER)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-03-28 16:27:25.399507: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", "2024-03-28 16:27:25.399759: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", "2024-03-28 16:27:25.671596: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", "2024-03-28 16:27:26.262645: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", "To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", "2024-03-28 16:27:30.960021: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "2024-03-28 16:28:03,311 SequenceTagger predicts: Dictionary with 69 tags: , O, DET, NFP, ADJFP, AUX, VPPMS, ADV, PREP, PDEMMS, NMS, COSUB, PINDMS, PPOBJMS, VERB, DETFS, NFS, YPFOR, VPPFS, PUNCT, DETMS, PROPN, ADJMS, PPER3FS, ADJFS, COCO, NMP, PREL, PPER1S, ADJMP, VPPMP, DINTMS, PPER3MS, PPER3MP, PREF, ADJ, DINTFS, CHIF, XFAMIL, PRELFS, SYM, NOUN, MOTINC, PINDFS, PPOBJMP, NUM, PREFP, PDEMFS, VPPFP, PPER3FP\n" ] } ], "source": [ "poser_hparams = load_hyperpyyaml(\"\"\"\n", "wer_stats_dposer: !new:speechbrain.utils.metric_stats.ErrorRateStats\n", "\n", "uposer_dict: !apply:speechbrain.utils.dictionaries.SynonymDictionary.from_json_path\n", " path: ./uposer.json\n", "wer_stats_uposer: !new:speechbrain.utils.metric_stats.ErrorRateStats\n", " equality_comparator: !ref \n", "\n", "pos_tagger: !apply:speechbrain.integrations.nlp.FlairSequenceTagger.from_hf\n", " source: \"qanastek/pos-french\"\n", " save_path: ./pretrained_models/\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "refs_poser = poser_hparams[\"pos_tagger\"](refs)\n", "hyps_poser = poser_hparams[\"pos_tagger\"](hyps)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "INTJ PREP DET NFS PDEMMS AUX PROPN XFAMIL PREP NMS PREP PREP CHIF CHIF NFP PREP DETFS NFS\n", "PREP DET NFS PDEMMS AUX PROPN XFAMIL PREP NMS PREP PREP CHIF CHIF NFP\n" ] } ], "source": [ "print(\" \".join(refs_poser[0]))\n", "print(\" \".join(hyps_poser[0]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### dPOSER\n", "\n", "Instead of computing WER on input words, we extract (preferably all) the parts-of-speech of the input sentences. The WER is then computed over the sequence of labels." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'WER': 14.70402051648298,\n", " 'SER': 88.87460987699652,\n", " 'num_edits': 18118,\n", " 'num_scored_tokens': 123218,\n", " 'num_erroneous_sents': 4841,\n", " 'num_scored_sents': 5447,\n", " 'num_absent_sents': 0,\n", " 'num_ref_sents': 5447,\n", " 'insertions': 2064,\n", " 'deletions': 8076,\n", " 'substitutions': 7978,\n", " 'error_rate': 14.70402051648298}" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "poser_hparams[\"wer_stats_dposer\"].clear()\n", "poser_hparams[\"wer_stats_dposer\"].append(\n", " ids=list(range(len(refs))),\n", " predict=hyps_poser,\n", " target=refs_poser,\n", ")\n", "poser_hparams[\"wer_stats_dposer\"].summarize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### uPOSER\n", "\n", "The cited paper proposes a variant (uPOSER) with broad POS categories, in case that the used POS model has very specific categories. This can simply be implemented by using a synonym dictionary that groups up equivalent labels easily." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'WER': 12.26687659270561,\n", " 'SER': 86.50633376170369,\n", " 'num_edits': 15115,\n", " 'num_scored_tokens': 123218,\n", " 'num_erroneous_sents': 4712,\n", " 'num_scored_sents': 5447,\n", " 'num_absent_sents': 0,\n", " 'num_ref_sents': 5447,\n", " 'insertions': 2089,\n", " 'deletions': 8101,\n", " 'substitutions': 4925,\n", " 'error_rate': 12.26687659270561}" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "poser_hparams[\"wer_stats_uposer\"].clear()\n", "poser_hparams[\"wer_stats_uposer\"].append(\n", " ids=list(range(len(refs))),\n", " predict=hyps_poser,\n", " target=refs_poser,\n", ")\n", "poser_hparams[\"wer_stats_uposer\"].summarize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lemma Error Rate (LER)\n", "\n", "Instead of computing the WER over words, we compute the WER over lemmatized words." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "!spacy download fr_core_news_md" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "ler_hparams = load_hyperpyyaml(\"\"\"\n", "ler_model: !apply:speechbrain.integrations.nlp.SpacyPipeline.from_name\n", " name: fr_core_news_md\n", " exclude: [\"tagger\", \"parser\", \"ner\", \"textcat\"]\n", "\n", "wer_stats_ler: !new:speechbrain.utils.metric_stats.ErrorRateStats\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "refs_ler = ler_hparams[\"ler_model\"].lemmatize(refs)\n", "hyps_ler = ler_hparams[\"ler_model\"].lemmatize(hyps)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bonsoir à tout bienvenue c ' être bfm story en direct jusqu ' à dix neuf heure à le un\n", "à tout bienvenue c ' être bfm story en direct jusqu ' à dix neuf heure\n" ] } ], "source": [ "print(\" \".join(refs_ler[0]))\n", "print(\" \".join(hyps_ler[0]))" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'WER': 14.426271595988885,\n", " 'SER': 88.61758766293373,\n", " 'num_edits': 19105,\n", " 'num_scored_tokens': 132432,\n", " 'num_erroneous_sents': 4827,\n", " 'num_scored_sents': 5447,\n", " 'num_absent_sents': 0,\n", " 'num_ref_sents': 5447,\n", " 'insertions': 2160,\n", " 'deletions': 10219,\n", " 'substitutions': 6726,\n", " 'error_rate': 14.426271595988885}" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ler_hparams[\"wer_stats_ler\"].clear()\n", "ler_hparams[\"wer_stats_ler\"].append(\n", " ids=list(range(len(refs))),\n", " predict=hyps_ler,\n", " target=refs_ler,\n", ")\n", "ler_hparams[\"wer_stats_ler\"].summarize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Embedding Error Rate (EmbER)\n", "\n", "Typical WER calculation, except that we weight the penalty of each word substitution if the words are deemed similar enough. This allows you to reduce the impact of e.g. minor spelling errors that do not alter the meaning much.\n", "\n", "Setup for this is slightly more involved but the gist of it is that you need:\n", "- A regular `ErrorRateStats` object which you will `.append()` to,\n", "- The embeddings that you will be using, e.g. using the `FlairEmbeddings` wrapper,\n", "- The EmbER configuration, which will point to the embedding (here binding to `ember_embeddings.embed_word`),\n", "- The `WeightedErrorRateStats` which piggy backs over the base `ErrorRateStats` and plugs into the EmbER similarity function defined just above." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "ember_hparams = load_hyperpyyaml(\"\"\"\n", "wer_stats: !new:speechbrain.utils.metric_stats.ErrorRateStats\n", "\n", "ember_embeddings: !apply:speechbrain.integrations.nlp.FlairEmbeddings.from_hf\n", " embeddings_class: !name:flair.embeddings.FastTextEmbeddings\n", " source: facebook/fasttext-fr-vectors\n", " save_path: ./pretrained_models/\n", "\n", "ember_metric: !new:speechbrain.utils.metric_stats.EmbeddingErrorRateSimilarity\n", " embedding_function: !name:speechbrain.integrations.nlp.FlairEmbeddings.embed_word\n", " - !ref \n", " low_similarity_weight: 1.0\n", " high_similarity_weight: 0.1\n", " threshold: 0.4\n", "\n", "weighted_wer_stats: !new:speechbrain.utils.metric_stats.WeightedErrorRateStats\n", " base_stats: !ref \n", " cost_function: !ref \n", " weight_name: ember\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING:gensim.models.fasttext:could not extract any ngrams from '()', returning origin vector\n" ] }, { "data": { "text/plain": [ "{'ember_wer': 12.225677015059036,\n", " 'ember_insertions': 1868.0,\n", " 'ember_substitutions': 5541.300000000059,\n", " 'ember_deletions': 7886.0,\n", " 'ember_num_edits': 15295.30000000006}" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ember_hparams[\"wer_stats\"].clear()\n", "ember_hparams[\"wer_stats\"].append(\n", " ids=list(range(len(refs))),\n", " predict=hyps,\n", " target=refs,\n", ")\n", "ember_hparams[\"weighted_wer_stats\"].clear()\n", "ember_hparams[\"weighted_wer_stats\"].summarize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## BERTScore\n", "\n", "In a nutshell, BERTScore works by comparing the cosine similarity of *all* targets and predicted embeddings, as obtained from a BERT-like LM encoder. This works rather well because the embeddings are trained to embed information from their context.\n", "\n", "This is best explained by the code and documentation of the metric itself." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bertscore_hparams = load_hyperpyyaml(\"\"\"\n", "bertscore_model_name: camembert/camembert-large\n", "bertscore_model_device: cuda\n", "\n", "bertscore_stats: !new:speechbrain.utils.bertscore.BERTScoreStats\n", " lm: !new:speechbrain.integrations.huggingface.TextEncoder\n", " source: !ref \n", " save_path: pretrained_models/\n", " device: !ref \n", " num_layers: 8\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'bertscore-recall': tensor(0.9033),\n", " 'bertscore-precision': tensor(0.9237),\n", " 'bertscore-f1': tensor(0.9134)}" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bertscore_hparams[\"bertscore_stats\"].clear()\n", "bertscore_hparams[\"bertscore_stats\"].append(\n", " ids=list(range(len(refs))),\n", " predict=hyps,\n", " target=refs,\n", ")\n", "bertscore_hparams[\"bertscore_stats\"].summarize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sentence Semantic Distance: SemDist\n", "\n", "Estimated using the cosine similarity of a single embedding for every sentence, e.g. obtained by averaging of LM embeddings over all tokens.\n", "\n", "Here, lower is better. The score is normalized by x1000 by default for readability." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "semdist_hparams = load_hyperpyyaml(\"\"\"\n", "semdist_model_name: camembert/camembert-large\n", "semdist_model_device: cuda\n", "\n", "semdist_stats: !new:speechbrain.utils.semdist.SemDistStats\n", " lm: !new:speechbrain.integrations.huggingface.TextEncoder\n", " source: !ref \n", " save_path: pretrained_models/\n", " device: !ref \n", " method: meanpool\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'semdist': 41.13104248046875}" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "semdist_hparams[\"semdist_stats\"].clear()\n", "semdist_hparams[\"semdist_stats\"].append(\n", " ids=list(range(len(refs))),\n", " predict=hyps,\n", " target=refs,\n", ")\n", "semdist_hparams[\"semdist_stats\"].summarize()" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'key': 0, 'semdist': 11.317432403564453},\n", " {'key': 1, 'semdist': 14.37997817993164},\n", " {'key': 2, 'semdist': 8.182466506958008},\n", " {'key': 3, 'semdist': 7.842123508453369},\n", " {'key': 4, 'semdist': 13.874173164367676}]" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "semdist_hparams[\"semdist_stats\"].scores[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Some comparisons\n", "\n", "This was a bit thrown together, if you've run everything without running out of RAM congratulations :)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=== REF: bonsoir à tous bienvenue c' est bfm story en direct jusqu' à dix neuf heures à la une\n", "=== HYP: à tous bienvenue c' est bfm story en direct jusqu' à dix neuf heures\n", "WER: 22.222%\n", "CER: 20.000%\n", "dPOSER: 22.222%\n", "uPOSER: 22.222%\n", "EmbER: 22.222%\n", "BERTScore recall: 0.87673\n", "BERTScore precision: 0.96040\n", "SemDist mean (x1000): 11.31743\n", "\n", "=== REF: de bfm story ce soir la zone euro va t elle encore vivre un été meurtrier l' allemagne première économie européenne pourrait perdre son triple a la situation se détériore en espagne\n", "=== HYP: bfm story ce soir la zone euro va t elle encore vive été meurtrier allemagne première économie européenne pourrait perdre son triple a la situation se détériore en espagne\n", "WER: 12.500%\n", "CER: 5.525%\n", "dPOSER: 15.625%\n", "uPOSER: 15.625%\n", "EmbER: 12.500%\n", "BERTScore recall: 0.91836\n", "BERTScore precision: 0.91983\n", "SemDist mean (x1000): 14.37998\n", "\n", "=== REF: pourquoi ces nouvelles tensions nous serons avec un spécialiste de l' espagne et nous serons avec le député socialiste rapporteur du budget en direct de l' assemblée nationale christian eckert\n", "=== HYP: ces nouvelles tensions sont avec un spécialiste de l' espagne et nous serons avec le député socialiste rapporteur du budget de l' assemblée nationale christian eckert\n", "WER: 16.667%\n", "CER: 14.062%\n", "dPOSER: 16.667%\n", "uPOSER: 16.667%\n", "EmbER: 13.667%\n", "BERTScore recall: 0.92581\n", "BERTScore precision: 0.96108\n", "SemDist mean (x1000): 8.18247\n", "\n", "=== REF: à la une également la syrie et les armes chimiques la russie demande au régime de bachar al assad de ne pas utiliser ces armes\n", "=== HYP: la une également la syrie et les armes chimiques la russie demande au régime de bachar el assad ne pas utiliser ses armes\n", "WER: 16.000%\n", "CER: 5.556%\n", "dPOSER: 12.000%\n", "uPOSER: 12.000%\n", "EmbER: 8.800%\n", "BERTScore recall: 0.95685\n", "BERTScore precision: 0.95836\n", "SemDist mean (x1000): 7.84212\n", "\n", "=== REF: de quel arsenal dispose l' armée syrienne\n", "=== HYP: quelle arsenal dispose l' armée syrienne\n", "WER: 28.571%\n", "CER: 12.195%\n", "dPOSER: 28.571%\n", "uPOSER: 14.286%\n", "EmbER: 28.571%\n", "BERTScore recall: 0.93197\n", "BERTScore precision: 0.93909\n", "SemDist mean (x1000): 13.87417\n", "\n", "=== REF: quels dégats pourraient provoquer ces armes chimiques\n", "=== HYP: dégâts pourraient provoquer ses armes chimiques\n", "WER: 42.857%\n", "CER: 15.094%\n", "dPOSER: 14.286%\n", "uPOSER: 14.286%\n", "EmbER: 30.000%\n", "BERTScore recall: 0.76464\n", "BERTScore precision: 0.85932\n", "SemDist mean (x1000): 46.58437\n", "\n", "=== REF: un spécialiste jean pierre daguzan nous répondra sur le plateau de bfm story et puis\n", "=== HYP: spécialistes ont bien accusant nous répondra sur le plateau de bfm story puis\n", "WER: 40.000%\n", "CER: 23.810%\n", "dPOSER: 40.000%\n", "uPOSER: 33.333%\n", "EmbER: 40.000%\n", "BERTScore recall: 0.70336\n", "BERTScore precision: 0.73710\n", "SemDist mean (x1000): 48.69765\n", "\n", "=== REF: après la droite populaire la droite humaniste voici la droite forte deux jeunes pousses de l' ump guillaume peltier et geoffroy didier lancent ce nouveau mouvement pourquoi faire ils sont mes invités ce soir\n", "=== HYP: la droite populaire la droite humaniste voici la droite forte deux jeunes pousses de l' ump guillaume peltier geoffroy didier migaud pour quoi faire ils sont mes invités ce soir\n", "WER: 20.588%\n", "CER: 17.391%\n", "dPOSER: 23.529%\n", "uPOSER: 17.647%\n", "EmbER: 20.588%\n", "BERTScore recall: 0.88929\n", "BERTScore precision: 0.92400\n", "SemDist mean (x1000): 11.49768\n", "\n", "=== REF: et puis c(ette) cette fois ci c' est vraiment la fin la fin de france soir liquidé par le tribunal de commerce nous en parlerons avec son tout dernier rédacteur en chef dominique de montvalon\n", "=== HYP: cette fois ci c' est vraiment la fin à la fin de france soir liquidé par le tribunal de commerce nous en parlerons avec tout dernier rédacteur en chef dominique de montvalon\n", "WER: 14.286%\n", "CER: 11.518%\n", "dPOSER: 14.286%\n", "uPOSER: 14.286%\n", "EmbER: 13.889%\n", "BERTScore recall: 0.87325\n", "BERTScore precision: 0.95048\n", "SemDist mean (x1000): 8.85153\n", "\n", "=== REF: damien gourlet bonsoir avec vous ce qu' il faut retenir ce soir dans l' actualité l' actualité ce sont encore les incendies en espagne\n", "=== HYP: damien gourlet bonsoir olivier avec vous ce qu' il faut retenir ce soir dans l' actualité actualité se sont encore les incendies en espagne\n", "WER: 12.500%\n", "CER: 8.955%\n", "dPOSER: 12.500%\n", "uPOSER: 8.333%\n", "EmbER: 8.400%\n", "BERTScore recall: 0.97822\n", "BERTScore precision: 0.94830\n", "SemDist mean (x1000): 9.74524\n", "\n" ] } ], "source": [ "for i in range(10):\n", " ref = \" \".join(refs[i])\n", " hyp = \" \".join(hyps[i])\n", "\n", " print(f\"\"\"\\\n", "=== REF: {ref}\n", "=== HYP: {hyp}\n", "WER: {wer_hparams['wer_stats'].scores[i]['WER']:.3f}%\n", "CER: {cer_hparams['cer_stats'].scores[i]['WER']:.3f}%\n", "dPOSER: {poser_hparams['wer_stats_dposer'].scores[i]['WER']:.3f}%\n", "uPOSER: {poser_hparams['wer_stats_uposer'].scores[i]['WER']:.3f}%\n", "EmbER: {ember_hparams['weighted_wer_stats'].scores[i]['WER']:.3f}%\n", "BERTScore recall: {bertscore_hparams['bertscore_stats'].scores[i]['recall']:.5f}\n", "BERTScore precision: {bertscore_hparams['bertscore_stats'].scores[i]['precision']:.5f}\n", "SemDist mean (x1000): {semdist_hparams['semdist_stats'].scores[i]['semdist']:.5f}\n", "\"\"\")" ] }, { "cell_type": "markdown", "metadata": { "id": "sb_auto_footer", "tags": [ "sb_auto_footer" ] }, "source": [ "## Citing SpeechBrain\n", "\n", "If you use SpeechBrain in your research or business, please cite it using the following BibTeX entry:\n", "\n", "```bibtex\n", "@misc{speechbrainV1,\n", " title={Open-Source Conversational AI with {SpeechBrain} 1.0},\n", " author={Mirco Ravanelli and Titouan Parcollet and Adel Moumen and Sylvain de Langen and Cem Subakan and Peter Plantinga and Yingzhi Wang and Pooneh Mousavi and Luca Della Libera and Artem Ploujnikov and Francesco Paissan and Davide Borra and Salah Zaiem and Zeyu Zhao and Shucong Zhang and Georgios Karakasidis and Sung-Lin Yeh and Pierre Champion and Aku Rouhe and Rudolf Braun and Florian Mai and Juan Zuluaga-Gomez and Seyed Mahed Mousavi and Andreas Nautsch and Xuechen Liu and Sangeet Sagar and Jarod Duret and Salima Mdhaffar and Gaelle Laperriere and Mickael Rouvier and Renato De Mori and Yannick Esteve},\n", " year={2024},\n", " eprint={2407.00463},\n", " archivePrefix={arXiv},\n", " primaryClass={cs.LG},\n", " url={https://arxiv.org/abs/2407.00463},\n", "}\n", "@misc{speechbrain,\n", " title={{SpeechBrain}: A General-Purpose Speech Toolkit},\n", " author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},\n", " year={2021},\n", " eprint={2106.04624},\n", " archivePrefix={arXiv},\n", " primaryClass={eess.AS},\n", " note={arXiv:2106.04624}\n", "}\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.6" } }, "nbformat": 4, "nbformat_minor": 4 }