Audio loading troubleshooting

This page is intended to document how to install torchaudio backends and provides troubleshooting steps for your audio loading troubles.

Introduction

SpeechBrain relies on torchaudio for loading audio files in most cases. Please first try to update torchaudio if you are encountering issues. Please also ensure that you are using the correct PyTorch version for your installed torchaudio version.

As of torchaudio 2.2.0, three backends are supported: ffmpeg, sox and soundfile. torchaudio documents how their backends are found in their optional dependency docs.

You can determine which backends are available in your environment by running torchaudio.list_audio_backends().

Warning

A backend can *silently* fail to load if initialization failed and will be omitted from this list.

Warning

Not every backend can support any codec. For instance, at the time of writing, the torchaudio SoX backend cannot handle MP3 and the SoundFile backend cannot handle AAC (usually .m4a), both of which are found in certain popular speech datasets. However, most common formats are typically well supported by all backends (.wav/.ogg vorbis/opus/.flac).

Note for developers & breaking torchaudio 2.x changes

With torchaudio <2.x, backends were selected through torchaudio.set_audio_backend. This function was deprecated and then removed in the 2.x branch of torchaudio and is no longer used in SpeechBrain. Since then, the backend is (optionally) selected through the backend argument of torchaudio.load() and torchaudio.info().

Installing/troubleshooting backends

ffmpeg

torchaudio compiles their ffmpeg backend for a specific range of ffmpeg versions.

ffmpeg is commonly already installed on common Linux distributions. On Ubuntu, it can be installed through sudo apt install ffmpeg.

Depending on your OS version, it is possible that your installed ffmpeg version is not supported by torchaudio (if too recent or too old). If you believe this to be the case, you can try installing a specific version of the ffmpeg package as supplied by conda-forge.

See torchaudio documentation on optional dependencies for more details.

SoundFile

torchaudio can use soundfile as an audio backend, which depends on libsndfile.

Starting with SoundFile 0.12.0, this package bundles a prebuilt libsndfile for a number of platforms. Refer to the project page for more details.

SoX

Starting with torchaudio 0.12.0, the SoX backend no longer supports mp3 files.

Starting with torchaudio 2.1.0, torchaudio no longer compiles and bundles SoX by itself, and expects it to be provided by the system.

If you have upgraded from an earlier version and can no longer load audio files, it may be due to this. In this case, you may need to install SoX or use a different backend.