============================= Audio loading troubleshooting ============================= This page is intended to document how to install audio backends and provides troubleshooting steps for your audio loading troubles. Introduction ============ SpeechBrain now uses `soundfile `_ as the sole supported audio I/O backend through the :mod:`speechbrain.dataio.audio_io` module. The soundfile backend supports most common audio formats including: ``wav``, ``flac``, and ``mp3``. For advanced format support or issues, please refer to the sections below. .. note:: **Legacy torchaudio backends**: SpeechBrain previously used torchaudio for audio I/O, which supported three backends: ``ffmpeg``, ``sox`` and ``soundfile``. However, torchaudio 2.9 deprecated all audio I/O support so SpeechBrain now relies on ``soundfile`` directly for audio I/O. Recommended install steps ========================= The pip package `soundfile` is a dependency of SpeechBrain and should be automatically installed when you install SpeechBrain. Starting with SoundFile 0.12.0, the pip package bundles a prebuilt ``libsndfile`` for most platforms (Windows, macOS, Linux), so it typically works out of the box when installed via pip. If you encounter issues with audio loading: - **Update soundfile**: Try running ``pip install --upgrade soundfile`` to get the latest version with updated ``libsndfile`` binaries. - **On Linux with superuser rights**: Install ``libsndfile`` through your distribution's package manager (e.g., ``sudo apt install libsndfile1`` on Ubuntu/Debian). - **For advanced codec support**: If you need to work with formats not supported by soundfile (e.g., AAC/M4A), you may need to convert your audio files to a supported format like WAV or FLAC using external tools such as ``ffmpeg``. - **Check installation**: You can verify soundfile is working by running: .. code-block:: python import soundfile as sf print(sf.__version__) print(sf.available_formats()) SpeechBrain Audio I/O API ========================== SpeechBrain provides its own audio I/O interface through the :mod:`speechbrain.dataio.audio_io` module. Usage example: .. code-block:: python from speechbrain.dataio import audio_io # Load audio file audio, sample_rate = audio_io.load("path/to/audio.wav") # Get audio metadata info = audio_io.info("path/to/audio.wav") print(info.sample_rate, info.duration, info.channels) # Save audio file audio_io.save("output.wav", audio, sample_rate) This API is compatible with the previous torchaudio-based interface, making migration straightforward.