speechbrain.lobes.downsampling module

Combinations of processing algorithms to implement downsampling methods.

Authors
  • Salah Zaiem

Summary

Classes:

ConcatDownsampler

Concatenation downsampling with naive frame dropping.

Conv1DDownsampler

1D Convolutional downsampling with a learned convolution

Downsampler

Wrapper for downsampling techniques

PoolingDownsampler

1D Pooling downsampling (non-learned)

SignalDownsampler

Signal downsampling (Decimation)

Reference

class speechbrain.lobes.downsampling.Downsampler(*args, **kwargs)[source]

Bases: Module

Wrapper for downsampling techniques

forward(x)[source]

Downsampling function

Parameters:

x (tensor) – Speech samples of shape [B,n_samples] with B the batch size

Return type:

Downsampled outputs.

class speechbrain.lobes.downsampling.SignalDownsampler(downsampling_factor, initial_sampling_rate)[source]

Bases: Downsampler

Signal downsampling (Decimation)

Parameters:
  • downsampling_factor (int) – Factor of downsampling (i.e. ratio (length before ds / length after ds))

  • initial_sampling_rate (int) – Sampling_rate of the input audios

Example

>>> sd = SignalDownsampler(2, 16000)
>>> a = torch.rand([8, 28000])
>>> a = sd(a)
>>> print(a.shape)
torch.Size([8, 14000])
class speechbrain.lobes.downsampling.Conv1DDownsampler(downsampling_factor, kernel_size)[source]

Bases: Downsampler

1D Convolutional downsampling with a learned convolution

Parameters:
  • downsampling_factor (int) – Factor of downsampling (i.e. ratio (length before ds / length after ds))

  • kernel_size (int) – Kernel size of the 1D filter (must be an odd integer)

Example

>>> sd = Conv1DDownsampler(3, 161)
>>> a = torch.rand([8, 33000])
>>> a = sd(a)
>>> print(a.shape)
torch.Size([8, 10947])
class speechbrain.lobes.downsampling.PoolingDownsampler(downsampling_factor, kernel_size, padding=0, pool_type='avg')[source]

Bases: Downsampler

1D Pooling downsampling (non-learned)

Parameters:
  • downsampling_factor (int) – Factor of downsampling (i.e. ratio (length before ds / length after ds))

  • kernel_size (int) – Kernel size of the 1D filter (must be an odd integer)

  • padding (int) – The number of padding elements to apply.

  • pool_type (string) – Pooling approach, must be within [β€œavg”,”max”]

Example

>>> sd = PoolingDownsampler(3, 41)
>>> a = torch.rand([8, 33000])
>>> a = sd(a)
>>> print(a.shape)
torch.Size([8, 10987])
class speechbrain.lobes.downsampling.ConcatDownsampler(downsampling_factor)[source]

Bases: Downsampler

Concatenation downsampling with naive frame dropping. Frames are dropped to make the time dimension divisible by the downsampling_factor.

Parameters:

downsampling_factor (int) – Factor of downsampling (i.e. ratio (length before ds / length after ds))

Example

>>> down = ConcatDownsampler(2)
>>> a = torch.rand([8, 40, 40])
>>> a = down(a)
>>> print(a.shape)
torch.Size([8, 20, 80])
forward(x)[source]

Downsamples x given the resampling factor.

Parameters:

x (torch.Tensor) – Factor of downsampling (i.e. ratio (length before ds / length after ds)).

Returns:

x – The downsampled tensor.

Return type:

torch.Tensor