speechbrain.nnet.pooling module

Library implementing pooling.

Authors
  • Titouan Parcollet 2020

  • Mirco Ravanelli 2020

  • Nauman Dawalatabad 2020

  • Jianyuan Zhong 2020

  • Sarthak Yadav 2022

  • Ha Nguyen 2023

Summary

Classes:

AdaptivePool

This class implements the adaptive average pooling.

AttentionPooling

This function implements a self-attention pooling (https://arxiv.org/abs/2008.01077).

GaussianLowpassPooling

This class implements a learnable Gaussian lowpass pooling from

Pooling1d

This function implements 1d pooling of the input tensor.

Pooling2d

This function implements 2d pooling of the input tensor.

StatisticsPooling

This class implements a statistic pooling layer.

Reference

class speechbrain.nnet.pooling.Pooling1d(pool_type, kernel_size, input_dims=3, pool_axis=1, ceil_mode=False, padding=0, dilation=1, stride=None)[source]

Bases: Module

This function implements 1d pooling of the input tensor.

Parameters:
  • pool_type (str) – It is the type of pooling function to use (β€˜avg’,’max’).

  • kernel_size (int) – It is the kernel size that defines the pooling dimension. For instance, kernel size=3 applies a 1D Pooling with a size=3.

  • input_dims (int) – The count of dimensions expected in the input.

  • pool_axis (int) – The axis where the pooling is applied.

  • ceil_mode (bool) – When True, will use ceil instead of floor to compute the output shape.

  • padding (int) – It is the number of padding elements to apply.

  • dilation (int) – Controls the dilation factor of pooling.

  • stride (int) – It is the stride size.

Example

>>> pool = Pooling1d('max',3)
>>> inputs = torch.rand(10, 12, 40)
>>> output=pool(inputs)
>>> output.shape
torch.Size([10, 4, 40])
forward(x)[source]

Performs 1d pooling to the input tensor.

Parameters:

x (torch.Tensor) – It represents a tensor for a mini-batch.

Returns:

x – The pooled outputs.

Return type:

torch.Tensor

class speechbrain.nnet.pooling.Pooling2d(pool_type, kernel_size, pool_axis=(1, 2), ceil_mode=False, padding=0, dilation=1, stride=None)[source]

Bases: Module

This function implements 2d pooling of the input tensor.

Parameters:
  • pool_type (str) – It is the type of pooling function to use (β€˜avg’,’max’).

  • kernel_size (int) – It is the kernel size that defines the pooling dimension. For instance, kernel size=3,3 performs a 2D Pooling with a 3x3 kernel.

  • pool_axis (tuple) – It is a list containing the axis that will be considered during pooling.

  • ceil_mode (bool) – When True, will use ceil instead of floor to compute the output shape.

  • padding (int) – It is the number of padding elements to apply.

  • dilation (int) – Controls the dilation factor of pooling.

  • stride (int) – It is the stride size.

Example

>>> pool = Pooling2d('max',(5,3))
>>> inputs = torch.rand(10, 15, 12)
>>> output=pool(inputs)
>>> output.shape
torch.Size([10, 3, 4])
forward(x)[source]

Performs 2d pooling to the input tensor.

Parameters:

x (torch.Tensor) – It represents a tensor for a mini-batch.

Returns:

x – The pooled outputs.

Return type:

torch.Tensor

class speechbrain.nnet.pooling.StatisticsPooling(return_mean=True, return_std=True)[source]

Bases: Module

This class implements a statistic pooling layer.

It returns the mean and/or std of input tensor.

Parameters:
  • return_mean (bool) – If True, the average pooling will be returned.

  • return_std (bool) – If True, the standard deviation will be returned.

Example

>>> inp_tensor = torch.rand([5, 100, 50])
>>> sp_layer = StatisticsPooling()
>>> out_tensor = sp_layer(inp_tensor)
>>> out_tensor.shape
torch.Size([5, 1, 100])
forward(x, lengths=None)[source]

Calculates mean and std for a batch (input tensor).

Parameters:
  • x (torch.Tensor) – It represents a tensor for a mini-batch.

  • lengths (torch.Tensor) – The lengths of the samples in the input.

Returns:

pooled_stats – The mean and std for the input.

Return type:

torch.Tensor

class speechbrain.nnet.pooling.AdaptivePool(output_size)[source]

Bases: Module

This class implements the adaptive average pooling.

Parameters:

output_size (int) – The size of the output.

Example

>>> pool = AdaptivePool(1)
>>> inp = torch.randn([8, 120, 40])
>>> output = pool(inp)
>>> output.shape
torch.Size([8, 1, 40])
forward(x)[source]

Performs adaptive pooling to the input tensor.

Parameters:

x (torch.Tensor) – It represents a tensor for a mini-batch.

Returns:

x – The pooled outputs.

Return type:

torch.Tensor

class speechbrain.nnet.pooling.GaussianLowpassPooling(in_channels, kernel_size, stride=1, initialization_constant=0.4, padding='same', padding_mode='constant', bias=True, skip_transpose=False)[source]

Bases: Module

This class implements a learnable Gaussian lowpass pooling from

Neil Zeghidour, Olivier Teboul, F{β€˜e}lix de Chaumont Quitry & Marco Tagliasacchi, β€œLEAF: A LEARNABLE FRONTEND FOR AUDIO CLASSIFICATION”, in Proc. of ICLR 2021 (https://arxiv.org/abs/2101.08596)

Parameters:
  • in_channels (int) – The number of input channels.

  • kernel_size (int) – Kernel size of the gaussian lowpass filters.

  • stride (int) – Stride factor of the convolutional filters. When the stride factor > 1, a decimation in time is performed.

  • initialization_constant (float) – The constant used for initialization, default 0.4

  • padding (str) – (same, valid). If β€œvalid”, no padding is performed. If β€œsame” and stride is 1, output shape is the same as the input shape.

  • padding_mode (str) – This flag specifies the type of padding. See torch.nn documentation for more information.

  • bias (bool) – If True, the additive bias b is adopted.

  • skip_transpose (bool) – If False, uses batch x time x channel convention of speechbrain. If True, uses batch x channel x time convention.

Example

>>> inp_tensor = torch.rand([10, 8000, 40])
>>> low_pass_pooling = GaussianLowpassPooling(
...     40, kernel_size=401, stride=160,
... )
>>> # parameters corresponding to a window of 25 ms and stride 10 ms at 16000 kHz
>>> out_tensor = low_pass_pooling(inp_tensor)
>>> out_tensor.shape
torch.Size([10, 50, 40])
forward(x)[source]

Performs GaussianLowpass Pooling.

Parameters:

x (torch.Tensor) – 3D tensor in input [batch,time,channels].

Returns:

outputs – The pooled outputs.

Return type:

torch.Tensor

class speechbrain.nnet.pooling.AttentionPooling(input_dim)[source]

Bases: Module

This function implements a self-attention pooling (https://arxiv.org/abs/2008.01077).

Parameters:

input_dim (int) – The dimension of the input torch.Tensor

Example

>>> inp_tensor = torch.rand([4, 40])
>>> pool = AttentionPooling(input_dim=40)
>>> out_tensor = pool(inp_tensor)
forward(x)[source]

Returns the output the adapter.

Parameters:

x (torch.Tensor) – Input tensor.

Returns:

out – The pooled outputs.

Return type:

torch.Tensor