Speech Processing Tasks

🔗 Speech Recognition From Scratch

Ravanelli M. & Parcollet T.

Apr. 2021

Difficulty: medium

Time: 45min

Do you want to figure out how to implement your speech recognizer with SpeechBrain? Look no further, you’re in the right place. This tutorial will walk you through all the steps needed to implement an offline end-to-end attention-based speech recognizer. This is a self-contained tutorial that will help you “connecting the dots” across all the steps needed to train a modern speech recognizer. We will address data preparation, tokenizer training, language model, ASR model, and inference. We will explain how to train your model on your data.

🔗 Metrics for Speech Recognition

de Langen S.

Sep. 2024

Difficulty: medium

Time: 30min

🔗 Google Colab

Estimating the accuracy of a speech recognition model is not a trivial problem. The Word Error Rate (WER) and Character Error Rate (CER) metrics are standard, but some research has been trying to develop alternatives that better correlate with human evaluation (such as SemDist).

This tutorial introduces some alternative ASR metrics and their flexible integration into SpeechBrain, which can help you research, use or develop new metrics.

🔗 Source Separation

Subakan C.

Jan. 2021

Difficulty: medium

Time: 30min

🔗 Google Colab

In source separation, the goal is to be able to separate out the sources from an observed mixture signal which consists of superposition of several sources. In this tutorial, we cover few examples of performing source separation with SpeechBrain.

🔗 Speech Enhancement From Scratch

Plantinga P.

Feb. 2021

Difficulty: medium

Time: 30min

🔗 Google Colab

So you want to do regression tasks with speech? Look no further, you’re in the right place. This tutorial will walk you through a basic speech enhancement template with SpeechBrain to show all the components needed for making a new recipe.

🔗 Speech Classification From Scratch

Ravanelli M.

Jan. 2021

Difficulty: medium

Time: 30min

🔗 Google Colab

In this tutorial, we show how to use SpeechBrain to implement an utterance-level speech classifier. It might help if you want to develop systems for speaker-id, language-id, emotion recognition, sound classification, keyword spotting, and many other tasks.

🔗 Voice Activity Detection

Ravanelli M.

Sept. 2021

Difficulty: easy

Time: 15min

🔗 Google Colab

In this tutorial, we show how to use SpeechBrain for voice activity detection. The tutorial will describe how to train a neural VAD and use it for inference on long audio recordings.

🔗 Forced Alignment

Plantinga P.

July 2025

Difficulty: easy

Time: 10min

🔗 Google Colab

In this tutorial, we show how to use SpeechBrain for forced alignment using k2 and a pretrained CTC-based ASR model.