speechbrain.nnet.normalization module

Library implementing normalization.

Authors
  • Mirco Ravanelli 2020

  • Guillermo Cámbara 2021

Summary

Classes:

BatchNorm1d

Applies 1d batch normalization to the input tensor.

BatchNorm2d

Applies 2d batch normalization to the input tensor.

GroupNorm

Applies group normalization to the input tensor.

InstanceNorm1d

Applies 1d instance normalization to the input tensor.

InstanceNorm2d

Applies 2d instance normalization to the input tensor.

LayerNorm

Applies layer normalization to the input tensor.

Reference

class speechbrain.nnet.normalization.BatchNorm1d(input_shape=None, input_size=None, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, combine_batch_time=False, skip_transpose=False)[source]

Bases: torch.nn.modules.module.Module

Applies 1d batch normalization to the input tensor.

Parameters
  • input_shape (tuple) – The expected shape of the input. Alternatively, use input_size.

  • input_size (int) – The expected size of the input. Alternatively, use input_shape.

  • eps (float) – This value is added to std deviation estimation to improve the numerical stability.

  • momentum (float) – It is a value used for the running_mean and running_var computation.

  • affine (bool) – When set to True, the affine parameters are learned.

  • track_running_stats (bool) – When set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics.

  • combine_batch_time (bool) – When true, it combines batch an time axis.

Example

>>> input = torch.randn(100, 10)
>>> norm = BatchNorm1d(input_shape=input.shape)
>>> output = norm(input)
>>> output.shape
torch.Size([100, 10])
forward(x)[source]

Returns the normalized input tensor.

Parameters

x (torch.Tensor (batch, time, [channels])) – input to normalize. 2d or 3d tensors are expected in input 4d tensors can be used when combine_dims=True.

training: bool
class speechbrain.nnet.normalization.BatchNorm2d(input_shape=None, input_size=None, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)[source]

Bases: torch.nn.modules.module.Module

Applies 2d batch normalization to the input tensor.

Parameters
  • input_shape (tuple) – The expected shape of the input. Alternatively, use input_size.

  • input_size (int) – The expected size of the input. Alternatively, use input_shape.

  • eps (float) – This value is added to std deviation estimation to improve the numerical stability.

  • momentum (float) – It is a value used for the running_mean and running_var computation.

  • affine (bool) – When set to True, the affine parameters are learned.

  • track_running_stats (bool) – When set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics.

Example

>>> input = torch.randn(100, 10, 5, 20)
>>> norm = BatchNorm2d(input_shape=input.shape)
>>> output = norm(input)
>>> output.shape
torch.Size([100, 10, 5, 20])
forward(x)[source]

Returns the normalized input tensor.

Parameters

x (torch.Tensor (batch, time, channel1, channel2)) – input to normalize. 4d tensors are expected.

training: bool
class speechbrain.nnet.normalization.LayerNorm(input_size=None, input_shape=None, eps=1e-05, elementwise_affine=True)[source]

Bases: torch.nn.modules.module.Module

Applies layer normalization to the input tensor.

Parameters
  • input_shape (tuple) – The expected shape of the input.

  • eps (float) – This value is added to std deviation estimation to improve the numerical stability.

  • elementwise_affine (bool) – If True, this module has learnable per-element affine parameters initialized to ones (for weights) and zeros (for biases).

Example

>>> input = torch.randn(100, 101, 128)
>>> norm = LayerNorm(input_shape=input.shape)
>>> output = norm(input)
>>> output.shape
torch.Size([100, 101, 128])
forward(x)[source]

Returns the normalized input tensor.

Parameters

x (torch.Tensor (batch, time, channels)) – input to normalize. 3d or 4d tensors are expected.

training: bool
class speechbrain.nnet.normalization.InstanceNorm1d(input_shape=None, input_size=None, eps=1e-05, momentum=0.1, track_running_stats=True, affine=False)[source]

Bases: torch.nn.modules.module.Module

Applies 1d instance normalization to the input tensor.

Parameters
  • input_shape (tuple) – The expected shape of the input. Alternatively, use input_size.

  • input_size (int) – The expected size of the input. Alternatively, use input_shape.

  • eps (float) – This value is added to std deviation estimation to improve the numerical stability.

  • momentum (float) – It is a value used for the running_mean and running_var computation.

  • track_running_stats (bool) – When set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics.

  • affine (bool) – A boolean value that when set to True, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default: False.

Example

>>> input = torch.randn(100, 10, 20)
>>> norm = InstanceNorm1d(input_shape=input.shape)
>>> output = norm(input)
>>> output.shape
torch.Size([100, 10, 20])
forward(x)[source]

Returns the normalized input tensor.

Parameters

x (torch.Tensor (batch, time, channels)) – input to normalize. 3d tensors are expected.

training: bool
class speechbrain.nnet.normalization.InstanceNorm2d(input_shape=None, input_size=None, eps=1e-05, momentum=0.1, track_running_stats=True, affine=False)[source]

Bases: torch.nn.modules.module.Module

Applies 2d instance normalization to the input tensor.

Parameters
  • input_shape (tuple) – The expected shape of the input. Alternatively, use input_size.

  • input_size (int) – The expected size of the input. Alternatively, use input_shape.

  • eps (float) – This value is added to std deviation estimation to improve the numerical stability.

  • momentum (float) – It is a value used for the running_mean and running_var computation.

  • track_running_stats (bool) – When set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics.

  • affine (bool) – A boolean value that when set to True, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default: False.

Example

>>> input = torch.randn(100, 10, 20, 2)
>>> norm = InstanceNorm2d(input_shape=input.shape)
>>> output = norm(input)
>>> output.shape
torch.Size([100, 10, 20, 2])
forward(x)[source]

Returns the normalized input tensor.

Parameters

x (torch.Tensor (batch, time, channel1, channel2)) – input to normalize. 4d tensors are expected.

training: bool
class speechbrain.nnet.normalization.GroupNorm(input_shape=None, input_size=None, num_groups=None, eps=1e-05, affine=True)[source]

Bases: torch.nn.modules.module.Module

Applies group normalization to the input tensor.

Parameters
  • input_shape (tuple) – The expected shape of the input. Alternatively, use input_size.

  • input_size (int) – The expected size of the input. Alternatively, use input_shape.

  • num_groups (int) – Number of groups to separate the channels into.

  • eps (float) – This value is added to std deviation estimation to improve the numerical stability.

  • affine (bool) – A boolean value that when set to True, this module has learnable per-channel affine parameters initialized to ones (for weights) and zeros (for biases).

Example

>>> input = torch.randn(100, 101, 128)
>>> norm = GroupNorm(input_size=128, num_groups=128)
>>> output = norm(input)
>>> output.shape
torch.Size([100, 101, 128])
forward(x)[source]

Returns the normalized input tensor.

Parameters

x (torch.Tensor (batch, time, channels)) – input to normalize. 3d or 4d tensors are expected.

training: bool