speechbrain.lobes.models.conv_tasnet module

Implementation of a popular speech separation model.

Summary

Classes:

ChannelwiseLayerNorm

Channel-wise Layer Normalization (cLN).

Chomp1d

This class cuts out a portion of the signal from the end.

Decoder

This class implements the decoder for the ConvTasnet.

DepthwiseSeparableConv

Building block for the Temporal Blocks of Masknet in ConvTasNet.

Encoder

This class learns the adaptive frontend for the ConvTasnet model.

GlobalLayerNorm

Global Layer Normalization (gLN).

MaskNet

param N

Number of filters in autoencoder.

TemporalBlock

The conv1d compound layers used in Masknet.

TemporalBlocksSequential

A wrapper for the temporal-block layer to replicate it

Functions:

choose_norm

This function returns the chosen normalization type.

Reference

class speechbrain.lobes.models.conv_tasnet.Encoder(L, N)[source]

Bases: torch.nn.modules.module.Module

This class learns the adaptive frontend for the ConvTasnet model.

Parameters
  • L (int) – The filter kernel size. Needs to be an odd number.

  • N (int) – Number of dimensions at the output of the adaptive front end.

Example

>>> inp = torch.rand(10, 100)
>>> encoder = Encoder(11, 20)
>>> h = encoder(inp)
>>> h.shape
torch.Size([10, 20, 20])
forward(mixture)[source]
Parameters

mixture (Tensor) – Tesor shape is [M, T]. M is batch size. T is #samples

Returns

mixture_w – Tensor shape is [M, K, N], where K = (T-L)/(L/2)+1 = 2T/L-1

Return type

Tensor

training: bool
class speechbrain.lobes.models.conv_tasnet.Decoder(L, N)[source]

Bases: torch.nn.modules.module.Module

This class implements the decoder for the ConvTasnet.

The separated source embeddings are fed to the decoder to reconstruct the estimated sources in the time domain.

Parameters

L (int) – Number of bases to use when reconstructing.

Example

>>> L, C, N = 8, 2, 8
>>> mixture_w = torch.randn(10, 100, N)
>>> est_mask = torch.randn(10, 100, C, N)
>>> Decoder = Decoder(L, N)
>>> mixture_hat = Decoder(mixture_w, est_mask)
>>> mixture_hat.shape
torch.Size([10, 404, 2])
forward(mixture_w, est_mask)[source]
Parameters
  • mixture_w (Tensor) – Tensor shape is [M, K, N].

  • est_mask (Tensor) – Tensor shape is [M, K, C, N].

Returns

est_source – Tensor shape is [M, T, C].

Return type

Tensor

training: bool
class speechbrain.lobes.models.conv_tasnet.TemporalBlocksSequential(input_shape, H, P, R, X, norm_type, causal)[source]

Bases: speechbrain.nnet.containers.Sequential

A wrapper for the temporal-block layer to replicate it

Parameters
  • input_shape (tuple) – Expected shape of the input.

  • H (int) – The number of intermediate channels.

  • P (int) – The kernel size in the convolutions.

  • R (int) – The number of times to replicate the multilayer Temporal Blocks.

  • X (int) – The number of layers of Temporal Blocks with different dilations.

  • type (norm) – The type of normalization, in [‘gLN’, ‘cLN’].

  • causal (bool) – To use causal or non-causal convolutions, in [True, False].

Example

>>> x = torch.randn(14, 100, 10)
>>> H, P, R, X = 10, 5, 2, 3
>>> TemporalBlocks = TemporalBlocksSequential(
...     x.shape, H, P, R, X, 'gLN', False
... )
>>> y = TemporalBlocks(x)
>>> y.shape
torch.Size([14, 100, 10])
class speechbrain.lobes.models.conv_tasnet.MaskNet(N, B, H, P, X, R, C, norm_type='gLN', causal=False, mask_nonlinear='relu')[source]

Bases: torch.nn.modules.module.Module

Parameters
  • N (>>>) – Number of filters in autoencoder.

  • B (int) – Number of channels in bottleneck 1 × 1-conv block.

  • H (int) – Number of channels in convolutional blocks.

  • P (int) – Kernel size in convolutional blocks.

  • X (int) – Number of convolutional blocks in each repeat.

  • R (int) – Number of repeats.

  • C (int) – Number of speakers.

  • norm_type (str) – One of BN, gLN, cLN.

  • causal (bool) – Causal or non-causal.

  • mask_nonlinear (str) – Use which non-linear function to generate mask, in [‘softmax’, ‘relu’].

  • Example

  • ---------

  • N

  • B

  • H

  • P

  • X

  • R

  • 11 (C =) –

  • 12

  • 2

  • 5

  • 3

  • 1

  • 2

  • MaskNet(N (>>> MaskNet =) –

  • B

  • H

  • P

  • X

  • R

  • C)

  • torch.randn(10 (>>> mixture_w =) –

  • 11

  • 100)

  • MaskNet(mixture_w) (>>> est_mask =) –

  • est_mask.shape (>>>) –

  • torch.Size([2

  • 10

  • 11

  • 100])

forward(mixture_w)[source]

Keep this API same with TasNet.

Parameters

mixture_w (Tensor) – Tensor shape is [M, K, N], M is batch size.

Returns

est_mask – Tensor shape is [M, K, C, N].

Return type

Tensor

training: bool
class speechbrain.lobes.models.conv_tasnet.TemporalBlock(input_shape, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]

Bases: torch.nn.modules.module.Module

The conv1d compound layers used in Masknet.

Parameters
  • input_shape (tuple) – The expected shape of the input.

  • out_channels (int) – The number of intermediate channels.

  • kernel_size (int) – The kernel size in the convolutions.

  • stride (int) – Convolution stride in convolutional layers.

  • padding (str) – The type of padding in the convolutional layers, (same, valid, causal). If “valid”, no padding is performed.

  • dilation (int) – Amount of dilation in convolutional layers.

  • type (norm) – The type of normalization, in [‘gLN’, ‘cLN’].

  • causal (bool) – To use causal or non-causal convolutions, in [True, False].

  • Example

  • ---------

  • torch.randn(14 (>>> x =) –

  • 100

  • 10)

  • TemporalBlock(x.shape (>>> TemporalBlock =) –

  • 10

  • 11

  • 1

  • 'same'

  • 1)

  • TemporalBlock(x) (>>> y =) –

  • y.shape (>>>) –

  • torch.Size([14

  • 100

  • 10])

forward(x)[source]
Parameters

x (Tensor) – Tensor shape is [M, K, B].

Returns

x – Tensor shape is [M, K, B].

Return type

Tensor

training: bool
class speechbrain.lobes.models.conv_tasnet.DepthwiseSeparableConv(input_shape, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]

Bases: speechbrain.nnet.containers.Sequential

Building block for the Temporal Blocks of Masknet in ConvTasNet.

Parameters
  • input_shape (tuple) – Expected shape of the input.

  • out_channels (int) – Number of output channels.

  • kernel_size (int) – The kernel size in the convolutions.

  • stride (int) – Convolution stride in convolutional layers.

  • padding (str) – The type of padding in the convolutional layers, (same, valid, causal). If “valid”, no padding is performed.

  • dilation (int) – Amount of dilation in convolutional layers.

  • type (norm) – The type of normalization, in [‘gLN’, ‘cLN’].

  • causal (bool) – To use causal or non-causal convolutions, in [True, False].

Example

>>> x = torch.randn(14, 100, 10)
>>> DSconv = DepthwiseSeparableConv(x.shape, 10, 11, 1, 'same', 1)
>>> y = DSconv(x)
>>> y.shape
torch.Size([14, 100, 10])
class speechbrain.lobes.models.conv_tasnet.Chomp1d(chomp_size)[source]

Bases: torch.nn.modules.module.Module

This class cuts out a portion of the signal from the end.

It is written as a class to be able to incorporate it inside a sequential wrapper.

Parameters

chomp_size (int) – The size of the portion to discard (in samples).

Example

>>> x = torch.randn(10, 110, 5)
>>> chomp = Chomp1d(10)
>>> x_chomped = chomp(x)
>>> x_chomped.shape
torch.Size([10, 100, 5])
forward(x)[source]

Arguments x : Tensor

Tensor shape is [M, Kpad, H].

Returns

x – Tensor shape is [M, K, H].

Return type

Tensor

training: bool
speechbrain.lobes.models.conv_tasnet.choose_norm(norm_type, channel_size)[source]

This function returns the chosen normalization type.

Parameters
  • norm_type (str) – One of [‘gLN’, ‘cLN’, ‘batchnorm’].

  • channel_size (int) – Number of channels.

Example

>>> choose_norm('gLN', 10)
GlobalLayerNorm()
class speechbrain.lobes.models.conv_tasnet.ChannelwiseLayerNorm(channel_size)[source]

Bases: torch.nn.modules.module.Module

Channel-wise Layer Normalization (cLN).

Parameters

channel_size (int) – Number of channels in the normalization dimension (the third dimension).

Example

>>> x = torch.randn(2, 3, 3)
>>> norm_func = ChannelwiseLayerNorm(3)
>>> x_normalized = norm_func(x)
>>> x.shape
torch.Size([2, 3, 3])
reset_parameters()[source]
forward(y)[source]
Args:

y: [M, K, N], M is batch size, N is channel size, K is length

Returns:

cLN_y: [M, K, N]

training: bool
class speechbrain.lobes.models.conv_tasnet.GlobalLayerNorm(channel_size)[source]

Bases: torch.nn.modules.module.Module

Global Layer Normalization (gLN).

Parameters

channel_size (int) – Number of channels in the third dimension.

Example

>>> x = torch.randn(2, 3, 3)
>>> norm_func = GlobalLayerNorm(3)
>>> x_normalized = norm_func(x)
>>> x.shape
torch.Size([2, 3, 3])
reset_parameters()[source]
forward(y)[source]
Parameters

y (Tensor) – Tensor shape [M, K, N]. M is batch size, N is channel size, and K is length.

Returns

gLN_y – Tensor shape [M, K. N]

Return type

Tensor

training: bool