speechbrain.dataio.audio_io moduleο
Lightweight soundfile-based audio I/O compatibility layer.
This module provides a minimal compatibility wrapper for audio I/O operations using soundfile (pysoundfile) library, replacing torchaudioβs load, save, and info functions.
Example
>>> from speechbrain.dataio import audio_io
>>> import torch
>>> # Save audio file
>>> waveform = torch.randn(1, 16000)
>>> tmpdir = getfixture("tmpdir")
>>> audio_io.save(tmpdir / "example.wav", waveform, 16000)
>>> # Load audio file
>>> audio, sr = audio_io.load(tmpdir / "example.wav")
>>> # Get audio metadata
>>> info = audio_io.info(tmpdir / "example.wav")
>>> info.duration
1.0
Summaryο
Classes:
Container for audio file metadata, compatible with torchaudio.info output. |
Functions:
Get audio file metadata using soundfile. |
|
List available audio backends. |
|
Load audio file using soundfile. |
|
Save audio to file using soundfile. |
Referenceο
- class speechbrain.dataio.audio_io.AudioInfo(sample_rate: int, frames: int, channels: int, subtype: str, format: str)[source]ο
Bases:
objectContainer for audio file metadata, compatible with torchaudio.info output.
- property num_framesο
Alias for frames for compatibility.
- property num_channelsο
Alias for channels for compatibility.
- property durationο
Calculate duration in seconds.
- speechbrain.dataio.audio_io.load(path, *, channels_first=True, dtype=None, always_2d=True, frame_offset=0, num_frames=-1)[source]ο
Load audio file using soundfile.
- Parameters:
path (str) β Path to the audio file.
channels_first (bool) β If True, returns tensor with shape (channels, frames). If False, returns tensor with shape (frames, channels). Ignored if
always_2dis False and input is mono. Default: True.dtype (torch.dtype, optional) β Data type for the output tensor. Respects default torch type. If the dtype is not one of the available dtypes in soundfile, loads with float32 first and then converts to the requested dtype.
always_2d (bool) β If True, always return a 2D tensor even for mono audio. If False, mono audio returns a 1D tensor (frames,). Default: True.
frame_offset (int) β Number of frames to skip at the start of the file. Default: 0.
num_frames (int) β Number of frames to read. If -1, reads to the end of the file. Default: -1.
- Returns:
tensor (torch.Tensor) β Audio waveform as a tensor.
sample_rate (int) β Sample rate of the audio file.
- speechbrain.dataio.audio_io.save(path, src, sample_rate, channels_first=True, subtype=None)[source]ο
Save audio to file using soundfile.
- Parameters:
path (str) β Path where to save the audio file.
src (torch.Tensor or numpy.ndarray) β
Audio waveform. Can be: - 1D tensor/array: (frames,) - mono - 2D tensor/array:
(channels, frames) if channels_first=True
(frames, channels) if channels_first=False
sample_rate (int) β Sample rate for the audio file.
channels_first (bool) β If True, input is assumed to be (channels, frames) If False, input is assumed to be (frames, channels). Ignored if input is 1D tensor/array. Default: True.
subtype (str, optional) β Audio encoding subtype (e.g., βPCM_16β, βPCM_24β, βPCM_32β, βFLOATβ). If None, soundfile will choose an appropriate subtype based on the file format. Default: None.