{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "sb_auto_header",
"tags": [
"sb_auto_header"
]
},
"source": [
"\n",
"\n",
"\n",
"[
](https://colab.research.google.com/github/speechbrain/speechbrain/blob/develop/docs/tutorials/tasks/speech-enhancement-from-scratch.ipynb)\n",
"to execute or view/download this notebook on\n",
"[GitHub](https://github.com/speechbrain/speechbrain/tree/develop/docs/tutorials/tasks/speech-enhancement-from-scratch.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "uo0JP7a5uFp7"
},
"source": [
"# Speech Enhancement From Scratch\n",
"\n",
"So you want to do regression tasks with speech? Look no further, you're in the right place. This tutorial will walk you through a basic speech enhancement template with SpeechBrain to show all the components needed for making a new recipe.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "trcWOmpnxQ0v"
},
"source": [
"Before jumping into the code, let's introduce a bit the problem of speech enhancement. The goal of speech enhancement is to remove noise from an input recording:\n",
"\n",
"\n",
"\n",
"The problem is very hard because of the huge variety of disturbances that might corrupt speech signals.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "E-WDlb5pytsF"
},
"source": [
"There are different ways to approach the problem. Nowadays, one of the most popular technique is masked-based speech enhancement:\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BOCyudX6zHxO"
},
"source": [
"In masking approaches, rather than estimating the enhanced signal directly, we estimate a soft mask. We then estimate the enhanced signal by multiplying the noisy one by the soft mask.\n",
"\n",
"Depending on the type of input/output we can have:\n",
"- Waveform masking (depicted in the figure above)\n",
"- Spectral masking (depicted in the figure below)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IZ3RXdfZ1RqS"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3sdmUcLn1WJH"
},
"source": [
"In spectral masking, the system maps noisy spectrograms into clean ones. This mapping is generally considered easier than waveform-to-waveform mapping. However, retrieving the signal in the time domain requires adding the phase information. The common solution (reasonable, but not ideal) consists to use the phase of the noisy signal. Waveform-masking approaches do not suffer from this limitation and are progressively gaining popularity within the community.\n",
"\n",
"It is worth mentioning that SpeechBrain currently supports even more advanced solutions for speech enhancement such as [MetricGAN+](https://arxiv.org/abs/2104.03538) (that learns the PESQ metric within an adversarial training framework) and [MimicLoss](https://github.com/speechbrain/speechbrain/tree/develop/recipes/Voicebank/MTL/ASR_enhance) (that achieves better enhancement using the information derived from a speech recognizer).\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pYKnhdQK8oGy"
},
"source": [
"In this tutorial, we will guide you through the creation of a simple speech enhancement system based on spectral masking.\n",
"\n",
"In particular, we will refer to the example reported here:\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CLBBM4rBxPsI"
},
"source": [
"The README provides a nice introduction, so it is reproduced here:\n",
"\n",
"==========================\n",
"\n",
"This folder provides a working, well-documented example for training a speech enhancement model from scratch, based on a few hours of data. The data we use is from Mini Librispeech + OpenRIR.\n",
"\n",
"There are four files here:\n",
"\n",
" * `train.py`: the main code file, outlines entire training process.\n",
" * `train.yaml`: the hyperparameters file, sets all parameters of execution.\n",
" * `custom_model.py`: A file containing the definition of a PyTorch module.\n",
" * `mini_librispeech_prepare.py`: If necessary, downloads and prepares data manifests.\n",
"\n",
"To train an enhancement model, just execute the following on the command-line:\n",
"\n",
" python train.py train.yaml --data_folder /path/to/save/mini_librispeech\n",
"\n",
"This will automatically download and prepare the data manifest for mini librispeech, and then train a model with dynamically generated noisy samples, using noise, reverberation, and babble.\n",
"\n",
"=========================\n",
"\n",
"So to start, let's make sure we can just run the template without modifications."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "LiREw1_tQUR5"
},
"outputs": [],
"source": [
"%%capture\n",
"# Installing SpeechBrain via pip\n",
"BRANCH = 'develop'\n",
"!python -m pip install git+https://github.com/speechbrain/speechbrain.git@$BRANCH\n",
"\n",
"# Clone SpeechBrain repository\n",
"!git clone https://github.com/speechbrain/speechbrain/"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "g8rw3XzK2FmK"
},
"outputs": [],
"source": [
"import speechbrain as sb"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "sgtDshhF5M6G"
},
"outputs": [],
"source": [
"%cd speechbrain/templates/enhancement\n",
"!python train.py train.yaml --device='cpu' --debug"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DUIwtVMG0ozq"
},
"source": [
"## Recipe overview in Train.py\n",
"\n",
"Let's start with the highest-level view of the recipe and work our way down. To do this, we should look at the bottom of the recipes where the `if __name__ == \"__main__\":` block defines the recipe structure. The basic process is:\n",
"\n",
"1. Load hyperparameters and command line overrides.\n",
"2. Prepare data manifests and loading objects.\n",
"3. Instantiate `SEBrain` sub-class as `se_brain`.\n",
"4. Call `se_brain.fit()` to perform training.\n",
"5. Call `se_brain.evaluate()` to check final performance.\n",
"\n",
"And that's it! Before we go and actually run this code, let's manually define the `SEBrain` sub-class of the `Brain` class. If you want a more in-depth tutorial about how the `Brain` class works, checkout the [Brain tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/brain-class.html).\n",
"\n",
"For simplicity, we'll just define the sub-class with just the first method override and then add the other overrides one-by-one. The first method is the `compute_forward` method which simply defines how the data is used by the model to make predictions. The return values should include any predictions made by the model. For this case specifically, the method computes the relevant features, computes a predicted mask, then applies the mask and re-computes time-domain signals."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "xaokPtFO0lMo"
},
"outputs": [],
"source": [
"class SEBrain(sb.Brain):\n",
" \"\"\"Class that manages the training loop. See speechbrain.core.Brain.\"\"\"\n",
"\n",
" def compute_forward(self, batch, stage):\n",
" \"\"\"Apply masking to convert from noisy waveforms to enhanced signals.\n",
"\n",
" Arguments\n",
" ---------\n",
" batch : PaddedBatch\n",
" This batch object contains all the relevant tensors for computation.\n",
" stage : sb.Stage\n",
" One of sb.Stage.TRAIN, sb.Stage.VALID, or sb.Stage.TEST.\n",
"\n",
" Returns\n",
" -------\n",
" predictions : dict\n",
" A dictionary with keys {\"spec\", \"wav\"} with predicted features.\n",
" \"\"\"\n",
"\n",
" # We first move the batch to the appropriate device, and\n",
" # compute the features necessary for masking.\n",
" batch = batch.to(self.device)\n",
" self.clean_wavs, self.lens = batch.clean_sig\n",
"\n",
" noisy_wavs, self.lens = self.hparams.wav_augment(\n",
" self.clean_wavs, self.lens\n",
" )\n",
"\n",
" noisy_feats = self.compute_feats(noisy_wavs)\n",
"\n",
" # Masking is done here with the \"signal approximation (SA)\" algorithm.\n",
" # The masked input is compared directly with clean speech targets.\n",
" mask = self.modules.model(noisy_feats)\n",
" predict_spec = torch.mul(mask, noisy_feats)\n",
"\n",
" # Also return predicted wav, for evaluation. Note that this could\n",
" # also be used for a time-domain loss term.\n",
" predict_wav = self.hparams.resynth(\n",
" torch.expm1(predict_spec), noisy_wavs\n",
" )\n",
"\n",
" # Return a dictionary so we don't have to remember the order\n",
" return {\"spec\": predict_spec, \"wav\": predict_wav}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "sA89RfLe4fiy"
},
"source": [
"If you're wondring here what the `self.modules` and `self.hparams` objects are, you're asking the right questions. These objects are constructed when the `SEBrain` class is instantiated, and come directly from the `dict` arguments to the initializer: `modules` and `hparams`. The keys to the dict provide the name that you use to reference the object, e.g. passing `{\"model\": model}` for `modules` would allow you to access the model with `self.modules.model`.\n",
"\n",
"The other method that is required to be defined in a `Brain` sub-class is the `compute_objectives` function. We sub-class `SEBrain` itself just to provide a convenient way to split up the class definition, don't use this technique in production code!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "4QlVFmDm3spK"
},
"outputs": [],
"source": [
"class SEBrain(SEBrain):\n",
" def compute_objectives(self, predictions, batch, stage):\n",
" \"\"\"Computes the loss given the predicted and targeted outputs.\n",
"\n",
" Arguments\n",
" ---------\n",
" predictions : dict\n",
" The output dict from `compute_forward`.\n",
" batch : PaddedBatch\n",
" This batch object contains all the relevant tensors for computation.\n",
" stage : sb.Stage\n",
" One of sb.Stage.TRAIN, sb.Stage.VALID, or sb.Stage.TEST.\n",
"\n",
" Returns\n",
" -------\n",
" loss : torch.Tensor\n",
" A one-element tensor used for backpropagating the gradient.\n",
" \"\"\"\n",
"\n",
" # Prepare clean targets for comparison\n",
" clean_spec = self.compute_feats(self.clean_wavs)\n",
"\n",
" # Directly compare the masked spectrograms with the clean targets\n",
" loss = sb.nnet.losses.mse_loss(\n",
" predictions[\"spec\"], clean_spec, self.lens\n",
" )\n",
"\n",
" # Append this batch of losses to the loss metric for easy\n",
" self.loss_metric.append(\n",
" batch.id,\n",
" predictions[\"spec\"],\n",
" clean_spec,\n",
" self.lens,\n",
" reduction=\"batch\",\n",
" )\n",
"\n",
" # Some evaluations are slower, and we only want to perform them\n",
" # on the validation set.\n",
" if stage != sb.Stage.TRAIN:\n",
"\n",
" # Evaluate speech intelligibility as an additional metric\n",
" self.stoi_metric.append(\n",
" batch.id,\n",
" predictions[\"wav\"],\n",
" self.clean_wavs,\n",
" self.lens,\n",
" reduction=\"batch\",\n",
" )\n",
"\n",
" return loss"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TU3wE6P6-nmo"
},
"source": [
"Both of these methods use a third method that is not an override called `compute_feats`, we'll quickly define it here:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3rpiY7PU-z-x"
},
"outputs": [],
"source": [
"class SEBrain(SEBrain):\n",
" def compute_feats(self, wavs):\n",
" \"\"\"Returns corresponding log-spectral features of the input waveforms.\n",
"\n",
" Arguments\n",
" ---------\n",
" wavs : torch.Tensor\n",
" The batch of waveforms to convert to log-spectral features.\n",
" \"\"\"\n",
"\n",
" # Log-spectral features\n",
" feats = self.hparams.compute_STFT(wavs)\n",
" feats = sb.processing.features.spectral_magnitude(feats, power=0.5)\n",
"\n",
" # Log1p reduces the emphasis on small differences\n",
" feats = torch.log1p(feats)\n",
"\n",
" return feats"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "N0Z6XX3h_UfO"
},
"source": [
"There's only two more methods defined, which are used to keep track of statistics and save checkpoints. These are the `on_stage_start` and `on_stage_end` methods, and they're called by `fit()` before and after iterating each dataset respectively. Before each stage, we set up the metric trackers:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "8gUTO6rq__I2"
},
"outputs": [],
"source": [
"class SEBrain(SEBrain):\n",
" def on_stage_start(self, stage, epoch=None):\n",
" \"\"\"Gets called at the beginning of each epoch.\n",
"\n",
" Arguments\n",
" ---------\n",
" stage : sb.Stage\n",
" One of sb.Stage.TRAIN, sb.Stage.VALID, or sb.Stage.TEST.\n",
" epoch : int\n",
" The currently-starting epoch. This is passed\n",
" `None` during the test stage.\n",
" \"\"\"\n",
"\n",
" # Set up statistics trackers for this stage\n",
" self.loss_metric = sb.utils.metric_stats.MetricStats(\n",
" metric=sb.nnet.losses.mse_loss\n",
" )\n",
"\n",
" # Set up evaluation-only statistics trackers\n",
" if stage != sb.Stage.TRAIN:\n",
" self.stoi_metric = sb.utils.metric_stats.MetricStats(\n",
" metric=sb.nnet.loss.stoi_loss.stoi_loss\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "MH0-TsmFBKsi"
},
"source": [
"After the validation stage, we use the trackers to summarize the stats, and save a checkpoint."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NzW4K0wjAnjz"
},
"outputs": [],
"source": [
"class SEBrain(SEBrain):\n",
" def on_stage_end(self, stage, stage_loss, epoch=None):\n",
" \"\"\"Gets called at the end of an epoch.\n",
"\n",
" Arguments\n",
" ---------\n",
" stage : sb.Stage\n",
" One of sb.Stage.TRAIN, sb.Stage.VALID, sb.Stage.TEST\n",
" stage_loss : float\n",
" The average loss for all of the data processed in this stage.\n",
" epoch : int\n",
" The currently-starting epoch. This is passed\n",
" `None` during the test stage.\n",
" \"\"\"\n",
"\n",
" # Store the train loss until the validation stage.\n",
" if stage == sb.Stage.TRAIN:\n",
" self.train_loss = stage_loss\n",
"\n",
" # Summarize the statistics from the stage for record-keeping.\n",
" else:\n",
" stats = {\n",
" \"loss\": stage_loss,\n",
" \"stoi\": -self.stoi_metric.summarize(\"average\"),\n",
" }\n",
"\n",
" # At the end of validation, we can write stats and checkpoints\n",
" if stage == sb.Stage.VALID:\n",
" # The train_logger writes a summary to stdout and to the logfile.\n",
" self.hparams.train_logger.log_stats(\n",
" {\"Epoch\": epoch},\n",
" train_stats={\"loss\": self.train_loss},\n",
" valid_stats=stats,\n",
" )\n",
"\n",
" # Save the current checkpoint and delete previous checkpoints,\n",
" # unless they have the current best STOI score.\n",
" self.checkpointer.save_and_keep_only(meta=stats, max_keys=[\"stoi\"])\n",
"\n",
" # We also write statistics about test data to stdout and to the logfile.\n",
" if stage == sb.Stage.TEST:\n",
" self.hparams.train_logger.log_stats(\n",
" {\"Epoch loaded\": self.hparams.epoch_counter.current},\n",
" test_stats=stats,\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VDGs6Va8Bg-M"
},
"source": [
"Okay, that's everything you need to define the `SEBrain` class! The only thing left before we can actually run this thing is the data loading functions. We'll use `DynamicItemDatasets` which you can learn more about in the [Tutorial on Data Loading](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/data-loading-pipeline.html). We need only to define the function that loads audio data, and we can use that to create all our datasets!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "5Tfh42jrBfGt"
},
"outputs": [],
"source": [
"def dataio_prep(hparams):\n",
" \"\"\"This function prepares the datasets to be used in the brain class.\n",
" It also defines the data processing pipeline through user-defined functions.\n",
"\n",
" We expect `prepare_mini_librispeech` to have been called before this,\n",
" so that the `train.json` and `valid.json` manifest files are available.\n",
"\n",
" Arguments\n",
" ---------\n",
" hparams : dict\n",
" This dictionary is loaded from the `train.yaml` file, and it includes\n",
" all the hyperparameters needed for dataset construction and loading.\n",
"\n",
" Returns\n",
" -------\n",
" datasets : dict\n",
" Contains two keys, \"train\" and \"valid\" that correspond\n",
" to the appropriate DynamicItemDataset object.\n",
" \"\"\"\n",
"\n",
" # Define audio pipeline. Adds noise, reverb, and babble on-the-fly.\n",
" # Of course for a real enhancement dataset, you'd want a fixed valid set.\n",
" @sb.utils.data_pipeline.takes(\"wav\")\n",
" @sb.utils.data_pipeline.provides(\"clean_sig\")\n",
" def audio_pipeline(wav):\n",
" \"\"\"Load the signal, and pass it and its length to the corruption class.\n",
" This is done on the CPU in the `collate_fn`.\"\"\"\n",
" clean_sig = sb.dataio.dataio.read_audio(wav)\n",
" return clean_sig\n",
"\n",
" # Define datasets sorted by ascending lengths for efficiency\n",
" datasets = {}\n",
" data_info = {\n",
" \"train\": hparams[\"train_annotation\"],\n",
" \"valid\": hparams[\"valid_annotation\"],\n",
" \"test\": hparams[\"test_annotation\"],\n",
" }\n",
" hparams[\"dataloader_options\"][\"shuffle\"] = False\n",
" for dataset in data_info:\n",
" datasets[dataset] = sb.dataio.dataset.DynamicItemDataset.from_json(\n",
" json_path=data_info[dataset],\n",
" replacements={\"data_root\": hparams[\"data_folder\"]},\n",
" dynamic_items=[audio_pipeline],\n",
" output_keys=[\"id\", \"clean_sig\"],\n",
" ).filtered_sorted(sort_key=\"length\")\n",
" return datasets\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GYcWWQnHDJis"
},
"source": [
"Now that we've defined all the code in `train.py` other than the `__main__` block, we can start running our recipe! This code is edited slightly to simplify the parts that don't necessarily apply to running the code in Colab. The first step is to load the hyperparameters. This creates a bunch of the needed objects automatically. You can find more info about how `HyperPyYAML` works in our [HyperPyYAML tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/hyperpyyaml.html). In addition, we'll create the folder for storing experimental data, checkpoints and statistics."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "AgFzrDmQDel1"
},
"outputs": [],
"source": [
"from hyperpyyaml import load_hyperpyyaml\n",
"with open(\"train.yaml\") as fin:\n",
" hparams = load_hyperpyyaml(fin)\n",
"sb.create_experiment_directory(hparams[\"output_folder\"])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JnlXGHTFE3fc"
},
"source": [
"As easily as that, we have access to our pytorch model, among many other hyperparameters. You can explore the `hparams` object at your leisure, but here's a few examples:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "-fJr6YByFGgW"
},
"outputs": [],
"source": [
"# Already-applied random seed\n",
"hparams[\"seed\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ASg3COH_FWK3"
},
"outputs": [],
"source": [
"# STFT function\n",
"hparams[\"compute_STFT\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "pTexBbI_Fnyk"
},
"outputs": [],
"source": [
"# Masking model\n",
"hparams[\"model\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "n9P7qvRLH-ts"
},
"source": [
"Prepare the data manifests and create the dataset objects using them with the function we defined earlier:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "xcGLOe9KIQoz"
},
"outputs": [],
"source": [
"from mini_librispeech_prepare import prepare_mini_librispeech\n",
"prepare_mini_librispeech(\n",
" data_folder=hparams[\"data_folder\"],\n",
" save_json_train=hparams[\"train_annotation\"],\n",
" save_json_valid=hparams[\"valid_annotation\"],\n",
" save_json_test=hparams[\"test_annotation\"],\n",
")\n",
"datasets = dataio_prep(hparams)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Zv6pfpsARkr4"
},
"source": [
"We can check that the data is being loaded correctly by seeing the first items:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "yurn8bQWRrT6"
},
"outputs": [],
"source": [
"import torch\n",
"datasets[\"train\"][0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "nJO6glavRun3"
},
"outputs": [],
"source": [
"datasets[\"valid\"][0]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "L6SOW1oiRNyV"
},
"source": [
"Instantiate the SEBrain object to prepare for training:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "c_F3CkDwRM9e"
},
"outputs": [],
"source": [
"se_brain = SEBrain(\n",
" modules=hparams[\"modules\"],\n",
" opt_class=hparams[\"opt_class\"],\n",
" hparams=hparams,\n",
" checkpointer=hparams[\"checkpointer\"],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Hotx8CmwSPPY"
},
"source": [
"And then call `fit()` to do the training! The `fit()` method iterates the training loop, calling the methods necessary to update the parameters of the model. Since all objects with changing state are managed by the Checkpointer, training can be stopped at any point, and will be resumed on next call."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "5AfAGLn-SVbV"
},
"outputs": [],
"source": [
"se_brain.fit(\n",
" epoch_counter=se_brain.hparams.epoch_counter,\n",
" train_set=datasets[\"train\"],\n",
" valid_set=datasets[\"valid\"],\n",
" train_loader_kwargs=hparams[\"dataloader_options\"],\n",
" valid_loader_kwargs=hparams[\"dataloader_options\"],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BIb_ie_fSo89"
},
"source": [
"Once training is complete, we can load the checkpoint that had the best performance on validation data (as measured by STOI) to evaluate."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "diWyNY-hS3h8"
},
"outputs": [],
"source": [
"se_brain.evaluate(\n",
" test_set=datasets[\"test\"],\n",
" max_key=\"stoi\",\n",
" test_loader_kwargs=hparams[\"dataloader_options\"],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "sb_auto_footer",
"tags": [
"sb_auto_footer"
]
},
"source": [
"## Citing SpeechBrain\n",
"\n",
"If you use SpeechBrain in your research or business, please cite it using the following BibTeX entry:\n",
"\n",
"```bibtex\n",
"@misc{speechbrainV1,\n",
" title={Open-Source Conversational AI with {SpeechBrain} 1.0},\n",
" author={Mirco Ravanelli and Titouan Parcollet and Adel Moumen and Sylvain de Langen and Cem Subakan and Peter Plantinga and Yingzhi Wang and Pooneh Mousavi and Luca Della Libera and Artem Ploujnikov and Francesco Paissan and Davide Borra and Salah Zaiem and Zeyu Zhao and Shucong Zhang and Georgios Karakasidis and Sung-Lin Yeh and Pierre Champion and Aku Rouhe and Rudolf Braun and Florian Mai and Juan Zuluaga-Gomez and Seyed Mahed Mousavi and Andreas Nautsch and Xuechen Liu and Sangeet Sagar and Jarod Duret and Salima Mdhaffar and Gaelle Laperriere and Mickael Rouvier and Renato De Mori and Yannick Esteve},\n",
" year={2024},\n",
" eprint={2407.00463},\n",
" archivePrefix={arXiv},\n",
" primaryClass={cs.LG},\n",
" url={https://arxiv.org/abs/2407.00463},\n",
"}\n",
"@misc{speechbrain,\n",
" title={{SpeechBrain}: A General-Purpose Speech Toolkit},\n",
" author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},\n",
" year={2021},\n",
" eprint={2106.04624},\n",
" archivePrefix={arXiv},\n",
" primaryClass={eess.AS},\n",
" note={arXiv:2106.04624}\n",
"}\n",
"```"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}