speechbrain.dataio.audio_io module

Lightweight soundfile-based audio I/O compatibility layer.

This module provides a minimal compatibility wrapper for audio I/O operations using soundfile (pysoundfile) library, replacing torchaudio’s load, save, and info functions.

Example

>>> from speechbrain.dataio import audio_io
>>> import torch
>>> # Save audio file
>>> waveform = torch.randn(1, 16000)
>>> tmpdir = getfixture("tmpdir")
>>> audio_io.save(tmpdir / "example.wav", waveform, 16000)
>>> # Load audio file
>>> audio, sr = audio_io.load(tmpdir / "example.wav")
>>> # Get audio metadata
>>> info = audio_io.info(tmpdir / "example.wav")
>>> info.duration
1.0

Authors

  • Peter Plantinga, 2025

  • Adel Moumen, 2026

Summary

Classes:

AudioInfo

Container for audio file metadata, compatible with torchaudio.info output.

Functions:

info

Get audio file metadata using soundfile.

list_audio_backends

List available audio backends.

load

Load audio file using soundfile.

save

Save audio to file using soundfile.

Reference

class speechbrain.dataio.audio_io.AudioInfo(sample_rate: int, frames: int, channels: int, subtype: str, format: str)[source]

Bases: object

Container for audio file metadata, compatible with torchaudio.info output.

sample_rate

Sample rate of the audio file.

Type:

int

frames

Total number of frames in the audio file.

Type:

int

channels

Number of audio channels.

Type:

int

subtype

Audio subtype/encoding (e.g., β€˜PCM_16’, β€˜PCM_24’).

Type:

str

format

Container format (e.g., β€˜WAV’, β€˜FLAC’).

Type:

str

sample_rate: int
frames: int
channels: int
subtype: str
format: str
property num_frames

Alias for frames for compatibility.

property num_channels

Alias for channels for compatibility.

property duration

Calculate duration in seconds.

speechbrain.dataio.audio_io.load(path, *, channels_first=True, dtype=None, always_2d=True, frame_offset=0, num_frames=-1)[source]

Load audio file using soundfile.

Parameters:
  • path (str) – Path to the audio file.

  • channels_first (bool) – If True, returns tensor with shape (channels, frames). If False, returns tensor with shape (frames, channels). Ignored if always_2d is False and input is mono. Default: True.

  • dtype (torch.dtype, optional) – Data type for the output tensor. Respects default torch type. If the dtype is not one of the available dtypes in soundfile, loads with float32 first and then converts to the requested dtype.

  • always_2d (bool) – If True, always return a 2D tensor even for mono audio. If False, mono audio returns a 1D tensor (frames,). Default: True.

  • frame_offset (int) – Number of frames to skip at the start of the file. Default: 0.

  • num_frames (int) – Number of frames to read. If -1, reads to the end of the file. Default: -1.

Returns:

  • tensor (torch.Tensor) – Audio waveform as a tensor.

  • sample_rate (int) – Sample rate of the audio file.

speechbrain.dataio.audio_io.save(path, src, sample_rate, channels_first=True, subtype=None)[source]

Save audio to file using soundfile.

Parameters:
  • path (str) – Path where to save the audio file.

  • src (torch.Tensor or numpy.ndarray) –

    Audio waveform. Can be: - 1D tensor/array: (frames,) - mono - 2D tensor/array:

    • (channels, frames) if channels_first=True

    • (frames, channels) if channels_first=False

  • sample_rate (int) – Sample rate for the audio file.

  • channels_first (bool) – If True, input is assumed to be (channels, frames) If False, input is assumed to be (frames, channels). Ignored if input is 1D tensor/array. Default: True.

  • subtype (str, optional) – Audio encoding subtype (e.g., β€˜PCM_16’, β€˜PCM_24’, β€˜PCM_32’, β€˜FLOAT’). If None, soundfile will choose an appropriate subtype based on the file format. Default: None.

speechbrain.dataio.audio_io.info(path)[source]

Get audio file metadata using soundfile.

Parameters:

path (str) – Path to the audio file.

Returns:

Object containing audio metadata (sample_rate, frames, channels, subtype, format, duration).

Return type:

AudioInfo

speechbrain.dataio.audio_io.list_audio_backends()[source]

List available audio backends.

Returns:

List of available backend names. Currently only [β€˜soundfile’].

Return type:

list of str