speechbrain.lobes.models.conv_tasnet moduleο
Implementation of a popular speech separation model.
Summaryο
Classes:
Channel-wise Layer Normalization (cLN). |
|
This class cuts out a portion of the signal from the end. |
|
This class implements the decoder for the ConvTasnet. |
|
Building block for the Temporal Blocks of Masknet in ConvTasNet. |
|
This class learns the adaptive frontend for the ConvTasnet model. |
|
Global Layer Normalization (gLN). |
|
The conv1d compound layers used in Masknet. |
|
A wrapper for the temporal-block layer to replicate it |
Functions:
This function returns the chosen normalization type. |
Referenceο
- class speechbrain.lobes.models.conv_tasnet.Encoder(L, N)[source]ο
Bases:
ModuleThis class learns the adaptive frontend for the ConvTasnet model.
- Parameters:
Example
>>> inp = torch.rand(10, 100) >>> encoder = Encoder(11, 20) >>> h = encoder(inp) >>> h.shape torch.Size([10, 20, 20])
- forward(mixture)[source]ο
- Parameters:
mixture (torch.Tensor) β Tensor shape is [M, T]. M is batch size. T is #samples
- Returns:
mixture_w β Tensor shape is [M, K, N], where K = (T-L)/(L/2)+1 = 2T/L-1
- Return type:
- class speechbrain.lobes.models.conv_tasnet.Decoder(L, N)[source]ο
Bases:
ModuleThis class implements the decoder for the ConvTasnet.
The separated source embeddings are fed to the decoder to reconstruct the estimated sources in the time domain.
Example
>>> L, C, N = 8, 2, 8 >>> mixture_w = torch.randn(10, 100, N) >>> est_mask = torch.randn(10, 100, C, N) >>> Decoder = Decoder(L, N) >>> mixture_hat = Decoder(mixture_w, est_mask) >>> mixture_hat.shape torch.Size([10, 404, 2])
- forward(mixture_w, est_mask)[source]ο
- Parameters:
mixture_w (torch.Tensor) β Tensor shape is [M, K, N].
est_mask (torch.Tensor) β Tensor shape is [M, K, C, N].
- Returns:
est_source β Tensor shape is [M, T, C].
- Return type:
- class speechbrain.lobes.models.conv_tasnet.TemporalBlocksSequential(input_shape, H, P, R, X, norm_type, causal)[source]ο
Bases:
SequentialA wrapper for the temporal-block layer to replicate it
- Parameters:
input_shape (tuple) β Expected shape of the input.
H (int) β The number of intermediate channels.
P (int) β The kernel size in the convolutions.
R (int) β The number of times to replicate the multilayer Temporal Blocks.
X (int) β The number of layers of Temporal Blocks with different dilations.
norm_type (str) β The type of normalization, in [βgLNβ, βcLNβ].
causal (bool) β To use causal or non-causal convolutions, in [True, False].
Example
>>> x = torch.randn(14, 100, 10) >>> H, P, R, X = 10, 5, 2, 3 >>> TemporalBlocks = TemporalBlocksSequential( ... x.shape, H, P, R, X, "gLN", False ... ) >>> y = TemporalBlocks(x) >>> y.shape torch.Size([14, 100, 10])
- class speechbrain.lobes.models.conv_tasnet.MaskNet(N, B, H, P, X, R, C, norm_type='gLN', causal=False, mask_nonlinear='relu')[source]ο
Bases:
Module- Parameters:
N (int) β Number of filters in autoencoder.
B (int) β Number of channels in bottleneck 1 Γ 1-conv block.
H (int) β Number of channels in convolutional blocks.
P (int) β Kernel size in convolutional blocks.
X (int) β Number of convolutional blocks in each repeat.
R (int) β Number of repeats.
C (int) β Number of speakers.
norm_type (str) β One of BN, gLN, cLN.
causal (bool) β Causal or non-causal.
mask_nonlinear (str) β Use which non-linear function to generate mask, in [βsoftmaxβ, βreluβ].
Example
>>> N, B, H, P, X, R, C = 11, 12, 2, 5, 3, 1, 2 >>> MaskNet = MaskNet(N, B, H, P, X, R, C) >>> mixture_w = torch.randn(10, 11, 100) >>> est_mask = MaskNet(mixture_w) >>> est_mask.shape torch.Size([2, 10, 11, 100])
- forward(mixture_w)[source]ο
Keep this API same with TasNet.
- Parameters:
mixture_w (torch.Tensor) β Tensor shape is [M, K, N], M is batch size.
- Returns:
est_mask β Tensor shape is [M, K, C, N].
- Return type:
- class speechbrain.lobes.models.conv_tasnet.TemporalBlock(input_shape, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]ο
Bases:
ModuleThe conv1d compound layers used in Masknet.
- Parameters:
input_shape (tuple) β The expected shape of the input.
out_channels (int) β The number of intermediate channels.
kernel_size (int) β The kernel size in the convolutions.
stride (int) β Convolution stride in convolutional layers.
padding (str) β The type of padding in the convolutional layers, (same, valid, causal). If βvalidβ, no padding is performed.
dilation (int) β Amount of dilation in convolutional layers.
norm_type (str) β The type of normalization, in [βgLNβ, βcLNβ].
causal (bool) β To use causal or non-causal convolutions, in [True, False].
Example
>>> x = torch.randn(14, 100, 10) >>> TemporalBlock = TemporalBlock(x.shape, 10, 11, 1, "same", 1) >>> y = TemporalBlock(x) >>> y.shape torch.Size([14, 100, 10])
- forward(x)[source]ο
- Parameters:
x (torch.Tensor) β Tensor shape is [M, K, B].
- Returns:
x β Tensor shape is [M, K, B].
- Return type:
- class speechbrain.lobes.models.conv_tasnet.DepthwiseSeparableConv(input_shape, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]ο
Bases:
SequentialBuilding block for the Temporal Blocks of Masknet in ConvTasNet.
- Parameters:
input_shape (tuple) β Expected shape of the input.
out_channels (int) β Number of output channels.
kernel_size (int) β The kernel size in the convolutions.
stride (int) β Convolution stride in convolutional layers.
padding (str) β The type of padding in the convolutional layers, (same, valid, causal). If βvalidβ, no padding is performed.
dilation (int) β Amount of dilation in convolutional layers.
norm_type (str) β The type of normalization, in [βgLNβ, βcLNβ].
causal (bool) β To use causal or non-causal convolutions, in [True, False].
Example
>>> x = torch.randn(14, 100, 10) >>> DSconv = DepthwiseSeparableConv(x.shape, 10, 11, 1, "same", 1) >>> y = DSconv(x) >>> y.shape torch.Size([14, 100, 10])
- class speechbrain.lobes.models.conv_tasnet.Chomp1d(chomp_size)[source]ο
Bases:
ModuleThis class cuts out a portion of the signal from the end.
It is written as a class to be able to incorporate it inside a sequential wrapper.
- Parameters:
chomp_size (int) β The size of the portion to discard (in samples).
Example
>>> x = torch.randn(10, 110, 5) >>> chomp = Chomp1d(10) >>> x_chomped = chomp(x) >>> x_chomped.shape torch.Size([10, 100, 5])
- forward(x)[source]ο
- Parameters:
x (torch.Tensor) β Tensor shape is [M, Kpad, H].
- Returns:
x β Tensor shape is [M, K, H].
- Return type:
- speechbrain.lobes.models.conv_tasnet.choose_norm(norm_type, channel_size)[source]ο
This function returns the chosen normalization type.
- Parameters:
- Return type:
Constructed layer of the chosen type
Example
>>> choose_norm("gLN", 10) GlobalLayerNorm()
- class speechbrain.lobes.models.conv_tasnet.ChannelwiseLayerNorm(channel_size)[source]ο
Bases:
ModuleChannel-wise Layer Normalization (cLN).
- Parameters:
channel_size (int) β Number of channels in the normalization dimension (the third dimension).
Example
>>> x = torch.randn(2, 3, 3) >>> norm_func = ChannelwiseLayerNorm(3) >>> x_normalized = norm_func(x) >>> x.shape torch.Size([2, 3, 3])
- class speechbrain.lobes.models.conv_tasnet.GlobalLayerNorm(channel_size)[source]ο
Bases:
ModuleGlobal Layer Normalization (gLN).
- Parameters:
channel_size (int) β Number of channels in the third dimension.
Example
>>> x = torch.randn(2, 3, 3) >>> norm_func = GlobalLayerNorm(3) >>> x_normalized = norm_func(x) >>> x.shape torch.Size([2, 3, 3])
- forward(y)[source]ο
- Parameters:
y (torch.Tensor) β Tensor shape [M, K, N]. M is batch size, N is channel size, and K is length.
- Returns:
gLN_y β Tensor shape [M, K. N]
- Return type: