speechbrain.nnet.pooling module

Library implementing pooling.

Authors
  • Titouan Parcollet 2020

  • Mirco Ravanelli 2020

  • Nauman Dawalatabad 2020

  • Jianyuan Zhong 2020

Summary

Classes:

AdaptivePool

This class implements the adaptive average pooling.

Pooling1d

This function implements 1d pooling of the input tensor.

Pooling2d

This function implements 2d pooling of the input tensor.

StatisticsPooling

This class implements a statistic pooling layer.

Reference

class speechbrain.nnet.pooling.Pooling1d(pool_type, kernel_size, input_dims=3, pool_axis=1, ceil_mode=False, padding=0, dilation=1, stride=None)[source]

Bases: torch.nn.modules.module.Module

This function implements 1d pooling of the input tensor.

Parameters
  • pool_type (str) – It is the type of pooling function to use (‘avg’,’max’).

  • kernel_size (int) – It is the kernel size that defines the pooling dimension. For instance, kernel size=3 applies a 1D Pooling with a size=3.

  • input_dims (int) – The count of dimensions expected in the input.

  • pool_axis (int) – The axis where the pooling is applied.

  • stride (int) – It is the stride size.

  • padding (int) – It is the number of padding elements to apply.

  • dilation (int) – Controls the dilation factor of pooling.

  • ceil_mode (int) – When True, will use ceil instead of floor to compute the output shape.

Example

>>> pool = Pooling1d('max',3)
>>> inputs = torch.rand(10, 12, 40)
>>> output=pool(inputs)
>>> output.shape
torch.Size([10, 4, 40])
forward(x)[source]
training: bool
class speechbrain.nnet.pooling.Pooling2d(pool_type, kernel_size, pool_axis=(1, 2), ceil_mode=False, padding=0, dilation=1, stride=None)[source]

Bases: torch.nn.modules.module.Module

This function implements 2d pooling of the input tensor.

Parameters
  • pool_type (str) – It is the type of pooling function to use (‘avg’,’max’).

  • pool_axis (tuple) – It is a list containing the axis that will be considered during pooling.

  • kernel_size (int) – It is the kernel size that defines the pooling dimension. For instance, kernel size=3,3 performs a 2D Pooling with a 3x3 kernel.

  • stride (int) – It is the stride size.

  • padding (int) – It is the number of padding elements to apply.

  • dilation (int) – Controls the dilation factor of pooling.

  • ceil_mode (int) – When True, will use ceil instead of floor to compute the output shape.

Example

>>> pool = Pooling2d('max',(5,3))
>>> inputs = torch.rand(10, 15, 12)
>>> output=pool(inputs)
>>> output.shape
torch.Size([10, 3, 4])
forward(x)[source]
training: bool
class speechbrain.nnet.pooling.StatisticsPooling(return_mean=True, return_std=True)[source]

Bases: torch.nn.modules.module.Module

This class implements a statistic pooling layer.

It returns the mean and/or std of input tensor.

Parameters
  • return_mean (True) – If True, the average pooling will be returned.

  • return_std (True) – If True, the standard deviation will be returned.

Example

>>> inp_tensor = torch.rand([5, 100, 50])
>>> sp_layer = StatisticsPooling()
>>> out_tensor = sp_layer(inp_tensor)
>>> out_tensor.shape
torch.Size([5, 1, 100])
forward(x, lengths=None)[source]

Calculates mean and std for a batch (input tensor).

Parameters

x (torch.Tensor) – It represents a tensor for a mini-batch.

training: bool
class speechbrain.nnet.pooling.AdaptivePool(output_size)[source]

Bases: torch.nn.modules.module.Module

This class implements the adaptive average pooling.

Parameters

delations (output_size) – The size of the output.

Example

>>> pool = AdaptivePool(1)
>>> inp = torch.randn([8, 120, 40])
>>> output = pool(inp)
>>> output.shape
torch.Size([8, 1, 40])
training: bool
forward(x)[source]