speechbrain.lobes.models.conv_tasnet module
Implementation of a popular speech separation model.
Summary
Classes:
Channel-wise Layer Normalization (cLN). |
|
This class cuts out a portion of the signal from the end. |
|
This class implements the decoder for the ConvTasnet. |
|
Building block for the Temporal Blocks of Masknet in ConvTasNet. |
|
This class learns the adaptive frontend for the ConvTasnet model. |
|
Global Layer Normalization (gLN). |
|
The conv1d compound layers used in Masknet. |
|
A wrapper for the temporal-block layer to replicate it |
Functions:
This function returns the chosen normalization type. |
Reference
- class speechbrain.lobes.models.conv_tasnet.Encoder(L, N)[source]
Bases:
ModuleThis class learns the adaptive frontend for the ConvTasnet model.
- Parameters:
Example
>>> inp = torch.rand(10, 100) >>> encoder = Encoder(11, 20) >>> h = encoder(inp) >>> h.shape torch.Size([10, 20, 20])
- class speechbrain.lobes.models.conv_tasnet.Decoder(L, N)[source]
Bases:
ModuleThis class implements the decoder for the ConvTasnet.
The separated source embeddings are fed to the decoder to reconstruct the estimated sources in the time domain.
- Parameters:
L (int) – Number of bases to use when reconstructing.
Example
>>> L, C, N = 8, 2, 8 >>> mixture_w = torch.randn(10, 100, N) >>> est_mask = torch.randn(10, 100, C, N) >>> Decoder = Decoder(L, N) >>> mixture_hat = Decoder(mixture_w, est_mask) >>> mixture_hat.shape torch.Size([10, 404, 2])
- class speechbrain.lobes.models.conv_tasnet.TemporalBlocksSequential(input_shape, H, P, R, X, norm_type, causal)[source]
Bases:
SequentialA wrapper for the temporal-block layer to replicate it
- Parameters:
input_shape (tuple) – Expected shape of the input.
H (int) – The number of intermediate channels.
P (int) – The kernel size in the convolutions.
R (int) – The number of times to replicate the multilayer Temporal Blocks.
X (int) – The number of layers of Temporal Blocks with different dilations.
type (norm) – The type of normalization, in [‘gLN’, ‘cLN’].
causal (bool) – To use causal or non-causal convolutions, in [True, False].
Example
>>> x = torch.randn(14, 100, 10) >>> H, P, R, X = 10, 5, 2, 3 >>> TemporalBlocks = TemporalBlocksSequential( ... x.shape, H, P, R, X, 'gLN', False ... ) >>> y = TemporalBlocks(x) >>> y.shape torch.Size([14, 100, 10])
- class speechbrain.lobes.models.conv_tasnet.MaskNet(N, B, H, P, X, R, C, norm_type='gLN', causal=False, mask_nonlinear='relu')[source]
Bases:
Module- Parameters:
N (>>>) – Number of filters in autoencoder.
B (int) – Number of channels in bottleneck 1 × 1-conv block.
H (int) – Number of channels in convolutional blocks.
P (int) – Kernel size in convolutional blocks.
X (int) – Number of convolutional blocks in each repeat.
R (int) – Number of repeats.
C (int) – Number of speakers.
norm_type (str) – One of BN, gLN, cLN.
causal (bool) – Causal or non-causal.
mask_nonlinear (str) – Use which non-linear function to generate mask, in [‘softmax’, ‘relu’].
Example
---------
N
B
H
P
X
R
11 (C =)
12
2
5
3
1
2
MaskNet(N (>>> MaskNet =)
B
H
P
X
R
C)
torch.randn(10 (>>> mixture_w =)
11
100)
MaskNet(mixture_w) (>>> est_mask =)
est_mask.shape (>>>)
torch.Size([2
10
11
100])
- class speechbrain.lobes.models.conv_tasnet.TemporalBlock(input_shape, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]
Bases:
ModuleThe conv1d compound layers used in Masknet.
- Parameters:
input_shape (tuple) – The expected shape of the input.
out_channels (int) – The number of intermediate channels.
kernel_size (int) – The kernel size in the convolutions.
stride (int) – Convolution stride in convolutional layers.
padding (str) – The type of padding in the convolutional layers, (same, valid, causal). If “valid”, no padding is performed.
dilation (int) – Amount of dilation in convolutional layers.
type (norm) – The type of normalization, in [‘gLN’, ‘cLN’].
causal (bool) – To use causal or non-causal convolutions, in [True, False].
Example
---------
torch.randn(14 (>>> x =)
100
10)
TemporalBlock(x.shape (>>> TemporalBlock =)
10
11
1
'same'
1)
TemporalBlock(x) (>>> y =)
y.shape (>>>)
torch.Size([14
100
10])
- class speechbrain.lobes.models.conv_tasnet.DepthwiseSeparableConv(input_shape, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]
Bases:
SequentialBuilding block for the Temporal Blocks of Masknet in ConvTasNet.
- Parameters:
input_shape (tuple) – Expected shape of the input.
out_channels (int) – Number of output channels.
kernel_size (int) – The kernel size in the convolutions.
stride (int) – Convolution stride in convolutional layers.
padding (str) – The type of padding in the convolutional layers, (same, valid, causal). If “valid”, no padding is performed.
dilation (int) – Amount of dilation in convolutional layers.
type (norm) – The type of normalization, in [‘gLN’, ‘cLN’].
causal (bool) – To use causal or non-causal convolutions, in [True, False].
Example
>>> x = torch.randn(14, 100, 10) >>> DSconv = DepthwiseSeparableConv(x.shape, 10, 11, 1, 'same', 1) >>> y = DSconv(x) >>> y.shape torch.Size([14, 100, 10])
- class speechbrain.lobes.models.conv_tasnet.Chomp1d(chomp_size)[source]
Bases:
ModuleThis class cuts out a portion of the signal from the end.
It is written as a class to be able to incorporate it inside a sequential wrapper.
- Parameters:
chomp_size (int) – The size of the portion to discard (in samples).
Example
>>> x = torch.randn(10, 110, 5) >>> chomp = Chomp1d(10) >>> x_chomped = chomp(x) >>> x_chomped.shape torch.Size([10, 100, 5])
- speechbrain.lobes.models.conv_tasnet.choose_norm(norm_type, channel_size)[source]
This function returns the chosen normalization type.
- Parameters:
Example
>>> choose_norm('gLN', 10) GlobalLayerNorm()
- class speechbrain.lobes.models.conv_tasnet.ChannelwiseLayerNorm(channel_size)[source]
Bases:
ModuleChannel-wise Layer Normalization (cLN).
- Parameters:
channel_size (int) – Number of channels in the normalization dimension (the third dimension).
Example
>>> x = torch.randn(2, 3, 3) >>> norm_func = ChannelwiseLayerNorm(3) >>> x_normalized = norm_func(x) >>> x.shape torch.Size([2, 3, 3])
- class speechbrain.lobes.models.conv_tasnet.GlobalLayerNorm(channel_size)[source]
Bases:
ModuleGlobal Layer Normalization (gLN).
- Parameters:
channel_size (int) – Number of channels in the third dimension.
Example
>>> x = torch.randn(2, 3, 3) >>> norm_func = GlobalLayerNorm(3) >>> x_normalized = norm_func(x) >>> x.shape torch.Size([2, 3, 3])