speechbrain.nnet.normalization module
Library implementing normalization.
- Authors
Mirco Ravanelli 2020
Guillermo Cámbara 2021
Sarthak Yadav 2022
Summary
Classes:
Applies 1d batch normalization to the input tensor. |
|
Applies 2d batch normalization to the input tensor. |
|
Applies learnable exponential moving average, as required by learnable PCEN layer |
|
Applies group normalization to the input tensor. |
|
Applies 1d instance normalization to the input tensor. |
|
Applies 2d instance normalization to the input tensor. |
|
Applies layer normalization to the input tensor. |
|
This class implements a learnable Per-channel energy normalization (PCEN) layer, supporting both original PCEN as specified in [1] as well as sPCEN as specified in [2] |
Reference
- class speechbrain.nnet.normalization.BatchNorm1d(input_shape=None, input_size=None, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, combine_batch_time=False, skip_transpose=False)[source]
Bases:
Module
Applies 1d batch normalization to the input tensor.
- Parameters:
input_shape (tuple) – The expected shape of the input. Alternatively, use
input_size
.input_size (int) – The expected size of the input. Alternatively, use
input_shape
.eps (float) – This value is added to std deviation estimation to improve the numerical stability.
momentum (float) – It is a value used for the running_mean and running_var computation.
affine (bool) – When set to True, the affine parameters are learned.
track_running_stats (bool) – When set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics.
combine_batch_time (bool) – When true, it combines batch an time axis.
Example
>>> input = torch.randn(100, 10) >>> norm = BatchNorm1d(input_shape=input.shape) >>> output = norm(input) >>> output.shape torch.Size([100, 10])
- forward(x)[source]
Returns the normalized input tensor.
- Parameters:
x (torch.Tensor (batch, time, [channels])) – input to normalize. 2d or 3d tensors are expected in input 4d tensors can be used when combine_dims=True.
- class speechbrain.nnet.normalization.BatchNorm2d(input_shape=None, input_size=None, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)[source]
Bases:
Module
Applies 2d batch normalization to the input tensor.
- Parameters:
input_shape (tuple) – The expected shape of the input. Alternatively, use
input_size
.input_size (int) – The expected size of the input. Alternatively, use
input_shape
.eps (float) – This value is added to std deviation estimation to improve the numerical stability.
momentum (float) – It is a value used for the running_mean and running_var computation.
affine (bool) – When set to True, the affine parameters are learned.
track_running_stats (bool) – When set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics.
Example
>>> input = torch.randn(100, 10, 5, 20) >>> norm = BatchNorm2d(input_shape=input.shape) >>> output = norm(input) >>> output.shape torch.Size([100, 10, 5, 20])
- forward(x)[source]
Returns the normalized input tensor.
- Parameters:
x (torch.Tensor (batch, time, channel1, channel2)) – input to normalize. 4d tensors are expected.
- class speechbrain.nnet.normalization.LayerNorm(input_size=None, input_shape=None, eps=1e-05, elementwise_affine=True)[source]
Bases:
Module
Applies layer normalization to the input tensor.
- Parameters:
input_shape (tuple) – The expected shape of the input.
eps (float) – This value is added to std deviation estimation to improve the numerical stability.
elementwise_affine (bool) – If True, this module has learnable per-element affine parameters initialized to ones (for weights) and zeros (for biases).
Example
>>> input = torch.randn(100, 101, 128) >>> norm = LayerNorm(input_shape=input.shape) >>> output = norm(input) >>> output.shape torch.Size([100, 101, 128])
- forward(x)[source]
Returns the normalized input tensor.
- Parameters:
x (torch.Tensor (batch, time, channels)) – input to normalize. 3d or 4d tensors are expected.
- class speechbrain.nnet.normalization.InstanceNorm1d(input_shape=None, input_size=None, eps=1e-05, momentum=0.1, track_running_stats=True, affine=False)[source]
Bases:
Module
Applies 1d instance normalization to the input tensor.
- Parameters:
input_shape (tuple) – The expected shape of the input. Alternatively, use
input_size
.input_size (int) – The expected size of the input. Alternatively, use
input_shape
.eps (float) – This value is added to std deviation estimation to improve the numerical stability.
momentum (float) – It is a value used for the running_mean and running_var computation.
track_running_stats (bool) – When set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics.
affine (bool) – A boolean value that when set to True, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default: False.
Example
>>> input = torch.randn(100, 10, 20) >>> norm = InstanceNorm1d(input_shape=input.shape) >>> output = norm(input) >>> output.shape torch.Size([100, 10, 20])
- forward(x)[source]
Returns the normalized input tensor.
- Parameters:
x (torch.Tensor (batch, time, channels)) – input to normalize. 3d tensors are expected.
- class speechbrain.nnet.normalization.InstanceNorm2d(input_shape=None, input_size=None, eps=1e-05, momentum=0.1, track_running_stats=True, affine=False)[source]
Bases:
Module
Applies 2d instance normalization to the input tensor.
- Parameters:
input_shape (tuple) – The expected shape of the input. Alternatively, use
input_size
.input_size (int) – The expected size of the input. Alternatively, use
input_shape
.eps (float) – This value is added to std deviation estimation to improve the numerical stability.
momentum (float) – It is a value used for the running_mean and running_var computation.
track_running_stats (bool) – When set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics.
affine (bool) – A boolean value that when set to True, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default: False.
Example
>>> input = torch.randn(100, 10, 20, 2) >>> norm = InstanceNorm2d(input_shape=input.shape) >>> output = norm(input) >>> output.shape torch.Size([100, 10, 20, 2])
- forward(x)[source]
Returns the normalized input tensor.
- Parameters:
x (torch.Tensor (batch, time, channel1, channel2)) – input to normalize. 4d tensors are expected.
- class speechbrain.nnet.normalization.GroupNorm(input_shape=None, input_size=None, num_groups=None, eps=1e-05, affine=True)[source]
Bases:
Module
Applies group normalization to the input tensor.
- Parameters:
input_shape (tuple) – The expected shape of the input. Alternatively, use
input_size
.input_size (int) – The expected size of the input. Alternatively, use
input_shape
.num_groups (int) – Number of groups to separate the channels into.
eps (float) – This value is added to std deviation estimation to improve the numerical stability.
affine (bool) – A boolean value that when set to True, this module has learnable per-channel affine parameters initialized to ones (for weights) and zeros (for biases).
Example
>>> input = torch.randn(100, 101, 128) >>> norm = GroupNorm(input_size=128, num_groups=128) >>> output = norm(input) >>> output.shape torch.Size([100, 101, 128])
- forward(x)[source]
Returns the normalized input tensor.
- Parameters:
x (torch.Tensor (batch, time, channels)) – input to normalize. 3d or 4d tensors are expected.
- class speechbrain.nnet.normalization.ExponentialMovingAverage(input_size: int, coeff_init: float = 0.04, per_channel: bool = False, trainable: bool = True, skip_transpose: bool = False)[source]
Bases:
Module
Applies learnable exponential moving average, as required by learnable PCEN layer
- Parameters:
input_size (int) – The expected size of the input.
coeff_init (float) – Initial smoothing coefficient value
per_channel (bool) – Controls whether every smoothing coefficients are learned independently for every input channel
trainable (bool) – whether to learn the PCEN parameters or use fixed
skip_transpose (bool) – If False, uses batch x time x channel convention of speechbrain. If True, uses batch x channel x time convention.
Example
>>> inp_tensor = torch.rand([10, 50, 40]) >>> pcen = ExponentialMovingAverage(40) >>> out_tensor = pcen(inp_tensor) >>> out_tensor.shape torch.Size([10, 50, 40])
- class speechbrain.nnet.normalization.PCEN(input_size, alpha: float = 0.96, smooth_coef: float = 0.04, delta: float = 2.0, root: float = 2.0, floor: float = 1e-12, trainable: bool = True, per_channel_smooth_coef: bool = True, skip_transpose: bool = False)[source]
Bases:
Module
This class implements a learnable Per-channel energy normalization (PCEN) layer, supporting both original PCEN as specified in [1] as well as sPCEN as specified in [2]
[1] Yuxuan Wang, Pascal Getreuer, Thad Hughes, Richard F. Lyon, Rif A. Saurous, “Trainable Frontend For Robust and Far-Field Keyword Spotting”, in Proc of ICASSP 2017 (https://arxiv.org/abs/1607.05666)
[2] Neil Zeghidour, Olivier Teboul, F{‘e}lix de Chaumont Quitry & Marco Tagliasacchi, “LEAF: A LEARNABLE FRONTEND FOR AUDIO CLASSIFICATION”, in Proc of ICLR 2021 (https://arxiv.org/abs/2101.08596)
The default argument values correspond with those used by [2].
- Parameters:
input_size (int) – The expected size of the input.
alpha (float) – specifies alpha coefficient for PCEN
smooth_coef (float) – specified smooth coefficient for PCEN
delta (float) – specifies delta coefficient for PCEN
root (float) – specifies root coefficient for PCEN
floor (float) – specifies floor coefficient for PCEN
trainable (bool) – whether to learn the PCEN parameters or use fixed
per_channel_smooth_coef (bool) – whether to learn independent smooth coefficients for every channel. when True, essentially using sPCEN from [2]
skip_transpose (bool) – If False, uses batch x time x channel convention of speechbrain. If True, uses batch x channel x time convention.
Example
>>> inp_tensor = torch.rand([10, 50, 40]) >>> pcen = PCEN(40, alpha=0.96) # sPCEN >>> out_tensor = pcen(inp_tensor) >>> out_tensor.shape torch.Size([10, 50, 40])
- forward(x)[source]
Returns the normalized input tensor.
- Parameters:
x (torch.Tensor (batch, time, channels)) – input to normalize.