speechbrain.nnet.pooling moduleο
Library implementing pooling.
- Authors
Titouan Parcollet 2020
Mirco Ravanelli 2020
Nauman Dawalatabad 2020
Jianyuan Zhong 2020
Sarthak Yadav 2022
Ha Nguyen 2023
Summaryο
Classes:
This class implements the adaptive average pooling. |
|
This function implements a self-attention pooling (https://arxiv.org/abs/2008.01077). |
|
This class implements a learnable Gaussian lowpass pooling from |
|
This function implements 1d pooling of the input tensor. |
|
This function implements 2d pooling of the input tensor. |
|
This class implements a statistic pooling layer. |
Referenceο
- class speechbrain.nnet.pooling.Pooling1d(pool_type, kernel_size, input_dims=3, pool_axis=1, ceil_mode=False, padding=0, dilation=1, stride=None)[source]ο
Bases:
Module
This function implements 1d pooling of the input tensor.
- Parameters:
pool_type (str) β It is the type of pooling function to use (βavgβ,βmaxβ).
kernel_size (int) β It is the kernel size that defines the pooling dimension. For instance, kernel size=3 applies a 1D Pooling with a size=3.
input_dims (int) β The count of dimensions expected in the input.
pool_axis (int) β The axis where the pooling is applied.
ceil_mode (bool) β When True, will use ceil instead of floor to compute the output shape.
padding (int) β It is the number of padding elements to apply.
dilation (int) β Controls the dilation factor of pooling.
stride (int) β It is the stride size.
Example
>>> pool = Pooling1d('max',3) >>> inputs = torch.rand(10, 12, 40) >>> output=pool(inputs) >>> output.shape torch.Size([10, 4, 40])
- class speechbrain.nnet.pooling.Pooling2d(pool_type, kernel_size, pool_axis=(1, 2), ceil_mode=False, padding=0, dilation=1, stride=None)[source]ο
Bases:
Module
This function implements 2d pooling of the input tensor.
- Parameters:
pool_type (str) β It is the type of pooling function to use (βavgβ,βmaxβ).
kernel_size (int) β It is the kernel size that defines the pooling dimension. For instance, kernel size=3,3 performs a 2D Pooling with a 3x3 kernel.
pool_axis (tuple) β It is a list containing the axis that will be considered during pooling.
ceil_mode (bool) β When True, will use ceil instead of floor to compute the output shape.
padding (int) β It is the number of padding elements to apply.
dilation (int) β Controls the dilation factor of pooling.
stride (int) β It is the stride size.
Example
>>> pool = Pooling2d('max',(5,3)) >>> inputs = torch.rand(10, 15, 12) >>> output=pool(inputs) >>> output.shape torch.Size([10, 3, 4])
- class speechbrain.nnet.pooling.StatisticsPooling(return_mean=True, return_std=True)[source]ο
Bases:
Module
This class implements a statistic pooling layer.
It returns the mean and/or std of input tensor.
- Parameters:
Example
>>> inp_tensor = torch.rand([5, 100, 50]) >>> sp_layer = StatisticsPooling() >>> out_tensor = sp_layer(inp_tensor) >>> out_tensor.shape torch.Size([5, 1, 100])
- forward(x, lengths=None)[source]ο
Calculates mean and std for a batch (input tensor).
- Parameters:
x (torch.Tensor) β It represents a tensor for a mini-batch.
lengths (torch.Tensor) β The lengths of the samples in the input.
- Returns:
pooled_stats β The mean and std for the input.
- Return type:
torch.Tensor
- class speechbrain.nnet.pooling.AdaptivePool(output_size)[source]ο
Bases:
Module
This class implements the adaptive average pooling.
- Parameters:
output_size (int) β The size of the output.
Example
>>> pool = AdaptivePool(1) >>> inp = torch.randn([8, 120, 40]) >>> output = pool(inp) >>> output.shape torch.Size([8, 1, 40])
- class speechbrain.nnet.pooling.GaussianLowpassPooling(in_channels, kernel_size, stride=1, initialization_constant=0.4, padding='same', padding_mode='constant', bias=True, skip_transpose=False)[source]ο
Bases:
Module
This class implements a learnable Gaussian lowpass pooling from
Neil Zeghidour, Olivier Teboul, F{βe}lix de Chaumont Quitry & Marco Tagliasacchi, βLEAF: A LEARNABLE FRONTEND FOR AUDIO CLASSIFICATIONβ, in Proc. of ICLR 2021 (https://arxiv.org/abs/2101.08596)
- Parameters:
in_channels (int) β The number of input channels.
kernel_size (int) β Kernel size of the gaussian lowpass filters.
stride (int) β Stride factor of the convolutional filters. When the stride factor > 1, a decimation in time is performed.
initialization_constant (float) β The constant used for initialization, default 0.4
padding (str) β (same, valid). If βvalidβ, no padding is performed. If βsameβ and stride is 1, output shape is the same as the input shape.
padding_mode (str) β This flag specifies the type of padding. See torch.nn documentation for more information.
bias (bool) β If True, the additive bias b is adopted.
skip_transpose (bool) β If False, uses batch x time x channel convention of speechbrain. If True, uses batch x channel x time convention.
Example
>>> inp_tensor = torch.rand([10, 8000, 40]) >>> low_pass_pooling = GaussianLowpassPooling( ... 40, kernel_size=401, stride=160, ... ) >>> # parameters corresponding to a window of 25 ms and stride 10 ms at 16000 kHz >>> out_tensor = low_pass_pooling(inp_tensor) >>> out_tensor.shape torch.Size([10, 50, 40])
- class speechbrain.nnet.pooling.AttentionPooling(input_dim)[source]ο
Bases:
Module
This function implements a self-attention pooling (https://arxiv.org/abs/2008.01077).
- Parameters:
input_dim (int) β The dimension of the input torch.Tensor
Example
>>> inp_tensor = torch.rand([4, 40]) >>> pool = AttentionPooling(input_dim=40) >>> out_tensor = pool(inp_tensor)