Audio loading troubleshooting

This page is intended to document how to install audio backends and provides troubleshooting steps for your audio loading troubles.

Introduction

SpeechBrain now uses soundfile as the sole supported audio I/O backend through the speechbrain.dataio.audio_io module.

The soundfile backend supports most common audio formats including: wav, flac, and mp3. For advanced format support or issues, please refer to the sections below.

Note

Legacy torchaudio backends: SpeechBrain previously used torchaudio for audio I/O, which supported three backends: ffmpeg, sox and soundfile. However, torchaudio 2.9 deprecated all audio I/O support so SpeechBrain now relies on soundfile directly for audio I/O.

SpeechBrain Audio I/O API

SpeechBrain provides its own audio I/O interface through the speechbrain.dataio.audio_io module. Usage example:

from speechbrain.dataio import audio_io

# Load audio file
audio, sample_rate = audio_io.load("path/to/audio.wav")

# Get audio metadata
info = audio_io.info("path/to/audio.wav")
print(info.sample_rate, info.duration, info.channels)

# Save audio file
audio_io.save("output.wav", audio, sample_rate)

This API is compatible with the previous torchaudio-based interface, making migration straightforward.