Audio loading troubleshooting
This page is intended to document how to install audio backends and provides troubleshooting steps for your audio loading troubles.
Introduction
SpeechBrain now uses soundfile as the
sole supported audio I/O backend through the speechbrain.dataio.audio_io module.
The soundfile backend supports most common audio formats including:
wav, flac, and mp3. For advanced format support or issues,
please refer to the sections below.
Note
Legacy torchaudio backends: SpeechBrain previously used torchaudio for
audio I/O, which supported three backends: ffmpeg, sox and soundfile.
However, torchaudio 2.9 deprecated all audio I/O support so SpeechBrain
now relies on soundfile directly for audio I/O.
Recommended install steps
The pip package soundfile is a dependency of SpeechBrain and should be automatically
installed when you install SpeechBrain.
Starting with SoundFile 0.12.0, the pip package bundles a prebuilt libsndfile
for most platforms (Windows, macOS, Linux), so it typically works out of the box
when installed via pip.
If you encounter issues with audio loading:
Update soundfile: Try running
pip install --upgrade soundfileto get the latest version with updatedlibsndfilebinaries.On Linux with superuser rights: Install
libsndfilethrough your distribution’s package manager (e.g.,sudo apt install libsndfile1on Ubuntu/Debian).For advanced codec support: If you need to work with formats not supported by soundfile (e.g., AAC/M4A), you may need to convert your audio files to a supported format like WAV or FLAC using external tools such as
ffmpeg.Check installation: You can verify soundfile is working by running:
import soundfile as sf print(sf.__version__) print(sf.available_formats())
SpeechBrain Audio I/O API
SpeechBrain provides its own audio I/O interface through the
speechbrain.dataio.audio_io module. Usage example:
from speechbrain.dataio import audio_io
# Load audio file
audio, sample_rate = audio_io.load("path/to/audio.wav")
# Get audio metadata
info = audio_io.info("path/to/audio.wav")
print(info.sample_rate, info.duration, info.channels)
# Save audio file
audio_io.save("output.wav", audio, sample_rate)
This API is compatible with the previous torchaudio-based interface, making migration straightforward.