speechbrain.lobes.models.conv_tasnet module
Implementation of a popular speech separation model.
Summary
Classes:
Channel-wise Layer Normalization (cLN). |
|
This class cuts out a portion of the signal from the end. |
|
This class implements the decoder for the ConvTasnet. |
|
Building block for the Temporal Blocks of Masknet in ConvTasNet. |
|
This class learns the adaptive frontend for the ConvTasnet model. |
|
Global Layer Normalization (gLN). |
|
|
|
The conv1d compound layers used in Masknet. |
|
A wrapper for the temporal-block layer to replicate it |
Functions:
This function returns the chosen normalization type. |
Reference
- class speechbrain.lobes.models.conv_tasnet.Encoder(L, N)[source]
Bases:
Module
This class learns the adaptive frontend for the ConvTasnet model.
- Parameters:
Example
>>> inp = torch.rand(10, 100) >>> encoder = Encoder(11, 20) >>> h = encoder(inp) >>> h.shape torch.Size([10, 20, 20])
- forward(mixture)[source]
- Parameters:
mixture (torch.Tensor) – Tensor shape is [M, T]. M is batch size. T is #samples
- Returns:
mixture_w – Tensor shape is [M, K, N], where K = (T-L)/(L/2)+1 = 2T/L-1
- Return type:
- class speechbrain.lobes.models.conv_tasnet.Decoder(L, N)[source]
Bases:
Module
This class implements the decoder for the ConvTasnet.
The separated source embeddings are fed to the decoder to reconstruct the estimated sources in the time domain.
Example
>>> L, C, N = 8, 2, 8 >>> mixture_w = torch.randn(10, 100, N) >>> est_mask = torch.randn(10, 100, C, N) >>> Decoder = Decoder(L, N) >>> mixture_hat = Decoder(mixture_w, est_mask) >>> mixture_hat.shape torch.Size([10, 404, 2])
- forward(mixture_w, est_mask)[source]
- Parameters:
mixture_w (torch.Tensor) – Tensor shape is [M, K, N].
est_mask (torch.Tensor) – Tensor shape is [M, K, C, N].
- Returns:
est_source – Tensor shape is [M, T, C].
- Return type:
- class speechbrain.lobes.models.conv_tasnet.TemporalBlocksSequential(input_shape, H, P, R, X, norm_type, causal)[source]
Bases:
Sequential
A wrapper for the temporal-block layer to replicate it
- Parameters:
input_shape (tuple) – Expected shape of the input.
H (int) – The number of intermediate channels.
P (int) – The kernel size in the convolutions.
R (int) – The number of times to replicate the multilayer Temporal Blocks.
X (int) – The number of layers of Temporal Blocks with different dilations.
norm_type (str) – The type of normalization, in [‘gLN’, ‘cLN’].
causal (bool) – To use causal or non-causal convolutions, in [True, False].
Example
>>> x = torch.randn(14, 100, 10) >>> H, P, R, X = 10, 5, 2, 3 >>> TemporalBlocks = TemporalBlocksSequential( ... x.shape, H, P, R, X, 'gLN', False ... ) >>> y = TemporalBlocks(x) >>> y.shape torch.Size([14, 100, 10])
- class speechbrain.lobes.models.conv_tasnet.MaskNet(N, B, H, P, X, R, C, norm_type='gLN', causal=False, mask_nonlinear='relu')[source]
Bases:
Module
- Parameters:
N (int) – Number of filters in autoencoder.
B (int) – Number of channels in bottleneck 1 × 1-conv block.
H (int) – Number of channels in convolutional blocks.
P (int) – Kernel size in convolutional blocks.
X (int) – Number of convolutional blocks in each repeat.
R (int) – Number of repeats.
C (int) – Number of speakers.
norm_type (str) – One of BN, gLN, cLN.
causal (bool) – Causal or non-causal.
mask_nonlinear (str) – Use which non-linear function to generate mask, in [‘softmax’, ‘relu’].
Example
>>> N, B, H, P, X, R, C = 11, 12, 2, 5, 3, 1, 2 >>> MaskNet = MaskNet(N, B, H, P, X, R, C) >>> mixture_w = torch.randn(10, 11, 100) >>> est_mask = MaskNet(mixture_w) >>> est_mask.shape torch.Size([2, 10, 11, 100])
- forward(mixture_w)[source]
Keep this API same with TasNet.
- Parameters:
mixture_w (torch.Tensor) – Tensor shape is [M, K, N], M is batch size.
- Returns:
est_mask – Tensor shape is [M, K, C, N].
- Return type:
- class speechbrain.lobes.models.conv_tasnet.TemporalBlock(input_shape, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]
Bases:
Module
The conv1d compound layers used in Masknet.
- Parameters:
input_shape (tuple) – The expected shape of the input.
out_channels (int) – The number of intermediate channels.
kernel_size (int) – The kernel size in the convolutions.
stride (int) – Convolution stride in convolutional layers.
padding (str) – The type of padding in the convolutional layers, (same, valid, causal). If “valid”, no padding is performed.
dilation (int) – Amount of dilation in convolutional layers.
norm_type (str) – The type of normalization, in [‘gLN’, ‘cLN’].
causal (bool) – To use causal or non-causal convolutions, in [True, False].
Example
>>> x = torch.randn(14, 100, 10) >>> TemporalBlock = TemporalBlock(x.shape, 10, 11, 1, 'same', 1) >>> y = TemporalBlock(x) >>> y.shape torch.Size([14, 100, 10])
- forward(x)[source]
- Parameters:
x (torch.Tensor) – Tensor shape is [M, K, B].
- Returns:
x – Tensor shape is [M, K, B].
- Return type:
- class speechbrain.lobes.models.conv_tasnet.DepthwiseSeparableConv(input_shape, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]
Bases:
Sequential
Building block for the Temporal Blocks of Masknet in ConvTasNet.
- Parameters:
input_shape (tuple) – Expected shape of the input.
out_channels (int) – Number of output channels.
kernel_size (int) – The kernel size in the convolutions.
stride (int) – Convolution stride in convolutional layers.
padding (str) – The type of padding in the convolutional layers, (same, valid, causal). If “valid”, no padding is performed.
dilation (int) – Amount of dilation in convolutional layers.
norm_type (str) – The type of normalization, in [‘gLN’, ‘cLN’].
causal (bool) – To use causal or non-causal convolutions, in [True, False].
Example
>>> x = torch.randn(14, 100, 10) >>> DSconv = DepthwiseSeparableConv(x.shape, 10, 11, 1, 'same', 1) >>> y = DSconv(x) >>> y.shape torch.Size([14, 100, 10])
- class speechbrain.lobes.models.conv_tasnet.Chomp1d(chomp_size)[source]
Bases:
Module
This class cuts out a portion of the signal from the end.
It is written as a class to be able to incorporate it inside a sequential wrapper.
- Parameters:
chomp_size (int) – The size of the portion to discard (in samples).
Example
>>> x = torch.randn(10, 110, 5) >>> chomp = Chomp1d(10) >>> x_chomped = chomp(x) >>> x_chomped.shape torch.Size([10, 100, 5])
- forward(x)[source]
- Parameters:
x (torch.Tensor) – Tensor shape is [M, Kpad, H].
- Returns:
x – Tensor shape is [M, K, H].
- Return type:
- speechbrain.lobes.models.conv_tasnet.choose_norm(norm_type, channel_size)[source]
This function returns the chosen normalization type.
- Parameters:
- Return type:
Constructed layer of the chosen type
Example
>>> choose_norm('gLN', 10) GlobalLayerNorm()
- class speechbrain.lobes.models.conv_tasnet.ChannelwiseLayerNorm(channel_size)[source]
Bases:
Module
Channel-wise Layer Normalization (cLN).
- Parameters:
channel_size (int) – Number of channels in the normalization dimension (the third dimension).
Example
>>> x = torch.randn(2, 3, 3) >>> norm_func = ChannelwiseLayerNorm(3) >>> x_normalized = norm_func(x) >>> x.shape torch.Size([2, 3, 3])
- class speechbrain.lobes.models.conv_tasnet.GlobalLayerNorm(channel_size)[source]
Bases:
Module
Global Layer Normalization (gLN).
- Parameters:
channel_size (int) – Number of channels in the third dimension.
Example
>>> x = torch.randn(2, 3, 3) >>> norm_func = GlobalLayerNorm(3) >>> x_normalized = norm_func(x) >>> x.shape torch.Size([2, 3, 3])
- forward(y)[source]
- Parameters:
y (torch.Tensor) – Tensor shape [M, K, N]. M is batch size, N is channel size, and K is length.
- Returns:
gLN_y – Tensor shape [M, K. N]
- Return type: