Neural Architecturesο
π Fine-tuning or using Whisper, wav2vec2, HuBERT and others with SpeechBrain and HuggingFace
Parcollet T. & Moumen A. |
Dec. 2022 |
Difficulty: medium |
Time: 20m |
This tutorial describes how to combine (use and finetune) pretrained models coming from HuggingFace. Any wav2vec 2.0 / HuBERT / WavLM or Whisper model integrated to the transformers interface of HuggingFace can be then plugged to SpeechBrain to approach a speech-related task: automatic speech recognition, speaker recognition, spoken language understanding β¦
π Neural Network Adapters for faster low-memory fine-tuning
Plantinga P. |
Sept. 2024 |
Difficulty: easy |
Time: 20m |
This tutorial covers the SpeechBrain implementation of adapters such as LoRA. This includes how to integrate either SpeechBrain implemented adapters, custom adapters, and adapters from libraries such as PEFT into a pre-trained model.
π Complex and Quaternion Neural Networks
Parcollet T. |
Feb. 2021 |
Difficulty: medium |
Time: 30min |
This tutorial demonstrates how to use the SpeechBrain implementation of complex-valued and quaternion-valued neural networks for speech technologies. It covers the basics of highdimensional representations and the associated neural layers : Linear, Convolution, Recurrent and Normalisation.
π Recurrent Neural Networks
Ravanelli M. |
Feb. 2021 |
Difficulty: easy |
Time: 30min |
Recurrent Neural Networks (RNNs) offer a natural way to process sequences. This tutorial demonstrates how to use the SpeechBrain implementations of RNNs including LSTMs, GRU, RNN and LiGRU a specific recurrent cell designed for speech-related tasks. RNNs are at the core of many sequence to sequence models.
π Streaming Speech Recognition with Conformers
de Langen S. |
Sep. 2024 |
Difficulty: medium |
Time: 60min+ |
Automatic Speech Recognition (ASR) models are often only designed to transcribe an entire large chunk of audio and are unsuitable for usecases like live stream transcription, which requires low-latency, long-form transcription.
This tutorial introduces the Dynamic Chunk Training approach and architectural changes you can apply to make the Conformer model streamable. It introduces the tooling for training and inference that SpeechBrain can provide for you. This might be a good starting point if youβre interested in training and understanding your own streaming models, or even if you want to explore improved streaming architectures.