speechbrain.lobes.models.conv_tasnet module¶
Implementation of a popular speech separation model.
Summary¶
Classes:
Channel-wise Layer Normalization (cLN). |
|
This class cuts out a portion of the signal from the end. |
|
This class implements the decoder for the ConvTasnet. |
|
Building block for the Temporal Blocks of Masknet in ConvTasNet. |
|
This class learns the adaptive frontend for the ConvTasnet model. |
|
Global Layer Normalization (gLN). |
|
|
|
The conv1d compound layers used in Masknet. |
|
A wrapper for the temporal-block layer to replicate it |
Functions:
This function returns the chosen normalization type. |
Reference¶
- class speechbrain.lobes.models.conv_tasnet.Encoder(L, N)[source]¶
Bases:
torch.nn.modules.module.Module
This class learns the adaptive frontend for the ConvTasnet model.
- Parameters
Example
>>> inp = torch.rand(10, 100) >>> encoder = Encoder(11, 20) >>> h = encoder(inp) >>> h.shape torch.Size([10, 20, 20])
- class speechbrain.lobes.models.conv_tasnet.Decoder(L, N)[source]¶
Bases:
torch.nn.modules.module.Module
This class implements the decoder for the ConvTasnet.
The separated source embeddings are fed to the decoder to reconstruct the estimated sources in the time domain.
- Parameters
L (int) – Number of bases to use when reconstructing.
Example
>>> L, C, N = 8, 2, 8 >>> mixture_w = torch.randn(10, 100, N) >>> est_mask = torch.randn(10, 100, C, N) >>> Decoder = Decoder(L, N) >>> mixture_hat = Decoder(mixture_w, est_mask) >>> mixture_hat.shape torch.Size([10, 404, 2])
- class speechbrain.lobes.models.conv_tasnet.TemporalBlocksSequential(input_shape, H, P, R, X, norm_type, causal)[source]¶
Bases:
speechbrain.nnet.containers.Sequential
A wrapper for the temporal-block layer to replicate it
- Parameters
input_shape (tuple) – Expected shape of the input.
H (int) – The number of intermediate channels.
P (int) – The kernel size in the convolutions.
R (int) – The number of times to replicate the multilayer Temporal Blocks.
X (int) – The number of layers of Temporal Blocks with different dilations.
type (norm) – The type of normalization, in [‘gLN’, ‘cLN’].
causal (bool) – To use causal or non-causal convolutions, in [True, False].
Example
>>> x = torch.randn(14, 100, 10) >>> H, P, R, X = 10, 5, 2, 3 >>> TemporalBlocks = TemporalBlocksSequential( ... x.shape, H, P, R, X, 'gLN', False ... ) >>> y = TemporalBlocks(x) >>> y.shape torch.Size([14, 100, 10])
- class speechbrain.lobes.models.conv_tasnet.MaskNet(N, B, H, P, X, R, C, norm_type='gLN', causal=False, mask_nonlinear='relu')[source]¶
Bases:
torch.nn.modules.module.Module
- Parameters
N (>>>) – Number of filters in autoencoder.
B (int) – Number of channels in bottleneck 1 × 1-conv block.
H (int) – Number of channels in convolutional blocks.
P (int) – Kernel size in convolutional blocks.
X (int) – Number of convolutional blocks in each repeat.
R (int) – Number of repeats.
C (int) – Number of speakers.
norm_type (str) – One of BN, gLN, cLN.
causal (bool) – Causal or non-causal.
mask_nonlinear (str) – Use which non-linear function to generate mask, in [‘softmax’, ‘relu’].
Example –
--------- –
N –
B –
H –
P –
X –
R –
= 11 (C) –
12 –
2 –
5 –
3 –
1 –
2 –
MaskNet = MaskNet(N (>>>) –
B –
H –
P –
X –
R –
C) –
mixture_w = torch.randn(10 (>>>) –
11 –
100) –
est_mask = MaskNet(mixture_w) (>>>) –
est_mask.shape (>>>) –
torch.Size([2 –
10 –
11 –
100]) –
- class speechbrain.lobes.models.conv_tasnet.TemporalBlock(input_shape, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]¶
Bases:
torch.nn.modules.module.Module
The conv1d compound layers used in Masknet.
- Parameters
input_shape (tuple) – The expected shape of the input.
out_channels (int) – The number of intermediate channels.
kernel_size (int) – The kernel size in the convolutions.
stride (int) – Convolution stride in convolutional layers.
padding (str) – The type of padding in the convolutional layers, (same, valid, causal). If “valid”, no padding is performed.
dilation (int) – Amount of dilation in convolutional layers.
type (norm) – The type of normalization, in [‘gLN’, ‘cLN’].
causal (bool) – To use causal or non-causal convolutions, in [True, False].
Example –
--------- –
x = torch.randn(14 (>>>) –
100 –
10) –
TemporalBlock = TemporalBlock(x.shape (>>>) –
10 –
11 –
1 –
'same' –
1) –
y = TemporalBlock(x) (>>>) –
y.shape (>>>) –
torch.Size([14 –
100 –
10]) –
- class speechbrain.lobes.models.conv_tasnet.DepthwiseSeparableConv(input_shape, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]¶
Bases:
speechbrain.nnet.containers.Sequential
Building block for the Temporal Blocks of Masknet in ConvTasNet.
- Parameters
input_shape (tuple) – Expected shape of the input.
out_channels (int) – Number of output channels.
kernel_size (int) – The kernel size in the convolutions.
stride (int) – Convolution stride in convolutional layers.
padding (str) – The type of padding in the convolutional layers, (same, valid, causal). If “valid”, no padding is performed.
dilation (int) – Amount of dilation in convolutional layers.
type (norm) – The type of normalization, in [‘gLN’, ‘cLN’].
causal (bool) – To use causal or non-causal convolutions, in [True, False].
Example
>>> x = torch.randn(14, 100, 10) >>> DSconv = DepthwiseSeparableConv(x.shape, 10, 11, 1, 'same', 1) >>> y = DSconv(x) >>> y.shape torch.Size([14, 100, 10])
- class speechbrain.lobes.models.conv_tasnet.Chomp1d(chomp_size)[source]¶
Bases:
torch.nn.modules.module.Module
This class cuts out a portion of the signal from the end.
It is written as a class to be able to incorporate it inside a sequential wrapper.
- Parameters
chomp_size (int) – The size of the portion to discard (in samples).
Example
>>> x = torch.randn(10, 110, 5) >>> chomp = Chomp1d(10) >>> x_chomped = chomp(x) >>> x_chomped.shape torch.Size([10, 100, 5])
- speechbrain.lobes.models.conv_tasnet.choose_norm(norm_type, channel_size)[source]¶
This function returns the chosen normalization type.
- Parameters
Example
>>> choose_norm('gLN', 10) GlobalLayerNorm()
- class speechbrain.lobes.models.conv_tasnet.ChannelwiseLayerNorm(channel_size)[source]¶
Bases:
torch.nn.modules.module.Module
Channel-wise Layer Normalization (cLN).
- Parameters
channel_size (int) – Number of channels in the normalization dimension (the third dimension).
Example
>>> x = torch.randn(2, 3, 3) >>> norm_func = ChannelwiseLayerNorm(3) >>> x_normalized = norm_func(x) >>> x.shape torch.Size([2, 3, 3])
- class speechbrain.lobes.models.conv_tasnet.GlobalLayerNorm(channel_size)[source]¶
Bases:
torch.nn.modules.module.Module
Global Layer Normalization (gLN).
- Parameters
channel_size (int) – Number of channels in the third dimension.
Example
>>> x = torch.randn(2, 3, 3) >>> norm_func = GlobalLayerNorm(3) >>> x_normalized = norm_func(x) >>> x.shape torch.Size([2, 3, 3])