Audio loading troubleshooting
This page is intended to document how to install torchaudio backends and provides troubleshooting steps for your audio loading troubles.
Introduction
SpeechBrain relies on torchaudio for loading audio files in most cases. Please first try to update torchaudio if you are encountering issues. Please also ensure that you are using the correct PyTorch version for your installed torchaudio version.
As of torchaudio 2.2.0
, three backends are supported: ffmpeg
, sox
and
soundfile
. torchaudio documents how their backends are found in their
optional dependency docs.
You can determine which backends are available in your environment by running
torchaudio.list_audio_backends()
.
Warning
A backend can *silently* fail to load if initialization failed and will be omitted from this list.
Warning
Not every backend can support any codec. For instance, at the time of
writing, the torchaudio SoX backend cannot handle MP3 and the SoundFile
backend cannot handle AAC (usually .m4a
), both of which are found in
certain popular speech datasets.
However, most common formats are typically well supported by all backends
(.wav
/.ogg
vorbis/opus/.flac
).
Recommended install steps
Often, torchaudio will work out of the box. On certain systems, there might not be a working backend installed. We recommend you try if any of those steps fixes your issue:
On Linux, if you have superuser rights, install ffmpeg and/or libsndfile and/or SoX through your distribution’s package manager.
On Windows/Linux/macOS, you can try installing ffmpeg through Conda (see ffmpeg), which does not require superuser rights (provided Conda is available).
On macOS, alternatively, it appears to be possible to install ffmpeg through Homebrew. Make sure that you are installing a version compatible with torchaudio (see ffmpeg).
On Windows/Linux/macOS, SoundFile has started shipping with a prebuilt
libsndfile
, which does not require admin rights. Try installing or updating it. See the linked page for more details.
Note for developers & breaking torchaudio 2.x
changes
With torchaudio <2.x
, backends were selected through
torchaudio.set_audio_backend
. This function was deprecated and then
removed in the 2.x
branch of torchaudio and is no longer used in SpeechBrain.
Since then, the backend is (optionally) selected through the backend
argument of torchaudio.load()
and torchaudio.info()
.
Installing/troubleshooting backends
ffmpeg
torchaudio compiles their ffmpeg backend for a specific range of ffmpeg versions.
ffmpeg is commonly already installed on common Linux distributions.
On Ubuntu, it can be installed through sudo apt install ffmpeg
.
Depending on your OS version, it is possible that your installed ffmpeg version
is not supported by torchaudio (if too recent or too old).
If you believe this to be the case, you can try installing a specific version
of the ffmpeg
package as supplied by
conda-forge.
See torchaudio documentation on optional dependencies for more details.
SoundFile
torchaudio can use soundfile as an
audio backend, which depends on libsndfile
.
Starting with SoundFile 0.12.0, this package bundles a prebuilt libsndfile
for a number of platforms. Refer to the project page for more details.
SoX
Starting with torchaudio 0.12.0, the SoX backend no longer supports mp3 files.
Starting with torchaudio 2.1.0, torchaudio no longer compiles and bundles SoX by itself, and expects it to be provided by the system.
If you have upgraded from an earlier version and can no longer load audio files, it may be due to this. In this case, you may need to install SoX or use a different backend.