speechbrain.lobes.models.resepformer moduleο
Library for the Resource-Efficient Sepformer.
- Authors
Cem Subakan 2022
Summaryο
Classes:
the Mem-LSTM of SkiM -- |
|
Resource Efficient Separation Pipeline Used for RE-SepFormer and SkiM |
|
Resource Efficient Source Separator This is the class that implements RE-SepFormer |
|
RNNBlock with output layer. |
|
A wrapper for the SpeechBrain implementation of the transformer encoder. |
|
the Segment-LSTM of SkiM |
Referenceο
- class speechbrain.lobes.models.resepformer.MemLSTM(hidden_size, dropout=0.0, bidirectional=False, mem_type='hc', norm_type='cln')[source]ο
Bases:
Module
the Mem-LSTM of SkiM β
Note: This is taken from the SkiM implementation in ESPNet toolkit and modified for compatibility with SpeechBrain.
- Parameters:
hidden_size (int) β Dimension of the hidden state.
dropout (float) β dropout ratio. Default is 0.
bidirectional (bool) β Whether the LSTM layers are bidirectional. Default is False.
mem_type (str) β βhcβ, βhβ, βcβ, or βidβ This controls whether the hidden (or cell) state of SegLSTM will be processed by MemLSTM. In βidβ mode, both the hidden and cell states will be identically returned.
norm_type (str) β βglnβ, βclnβ This selects the type of normalization cln is for causal implementation
Example
>>> x = (torch.randn(1, 5, 64), torch.randn(1, 5, 64)) >>> block = MemLSTM(64) >>> x = block(x, 5) >>> x[0].shape torch.Size([1, 5, 64])
- class speechbrain.lobes.models.resepformer.SegLSTM(input_size, hidden_size, dropout=0.0, bidirectional=False, norm_type='cLN')[source]ο
Bases:
Module
the Segment-LSTM of SkiM
Note: This is taken from the SkiM implementation in ESPNet toolkit and modified for compatibility with SpeechBrain.
- Parameters:
input_size (int,) β dimension of the input feature. The input should have shape (batch, seq_len, input_size).
hidden_size (int,) β dimension of the hidden state.
dropout (float,) β dropout ratio. Default is 0.
bidirectional (bool,) β whether the LSTM layers are bidirectional. Default is False.
norm_type (str) β One of gln, cln. This selects the type of normalization cln is for causal implementation.
Example
>>> x = torch.randn(3, 20, 64) >>> hc = None >>> seglstm = SegLSTM(64, 64) >>> y = seglstm(x, hc) >>> y[0].shape torch.Size([3, 20, 64])
- forward(input, hc)[source]ο
The forward function of the Segment LSTM
- Parameters:
input (torch.Tensor) β
shape [B*S, T, H] where B is the batchsize
S is the number of chunks T is the chunks size H is the latent dimensionality
hc (tuple) β
tuple of hidden and cell states from SegLSTM shape of h and c: (d, B*S, H)
- where d is the number of directions
B is the batchsize S is the number chunks H is the latent dimensionality
- Returns:
output (torch.Tensor) β Output of Segment LSTM
(h, c) (tuple) β Same as hc input
- class speechbrain.lobes.models.resepformer.SBRNNBlock(input_size, hidden_channels, num_layers, outsize, rnn_type='LSTM', dropout=0, bidirectional=True)[source]ο
Bases:
Module
RNNBlock with output layer.
- Parameters:
input_size (int) β Dimensionality of the input features.
hidden_channels (int) β Dimensionality of the latent layer of the rnn.
num_layers (int) β Number of the rnn layers.
outsize (int) β Number of dimensions at the output of the linear layer
rnn_type (str) β Type of the the rnn cell.
dropout (float) β Dropout rate
bidirectional (bool) β If True, bidirectional.
Example
>>> x = torch.randn(10, 100, 64) >>> rnn = SBRNNBlock(64, 100, 1, 128, bidirectional=True) >>> x = rnn(x) >>> x.shape torch.Size([10, 100, 128])
- class speechbrain.lobes.models.resepformer.SBTransformerBlock_wnormandskip(num_layers, d_model, nhead, d_ffn=2048, input_shape=None, kdim=None, vdim=None, dropout=0.1, activation='relu', use_positional_encoding=False, norm_before=False, attention_type='regularMHA', causal=False, use_norm=True, use_skip=True, norm_type='gln')[source]ο
Bases:
Module
A wrapper for the SpeechBrain implementation of the transformer encoder.
- Parameters:
num_layers (int) β Number of layers.
d_model (int) β Dimensionality of the representation.
nhead (int) β Number of attention heads.
d_ffn (int) β Dimensionality of positional feed forward.
input_shape (tuple) β Shape of input.
kdim (int) β Dimension of the key (Optional).
vdim (int) β Dimension of the value (Optional).
dropout (float) β Dropout rate.
activation (str) β Activation function.
use_positional_encoding (bool) β If true we use a positional encoding.
norm_before (bool) β Use normalization before transformations.
attention_type (str) β Type of attention, default βregularMHAβ
causal (bool) β Whether to mask future information, default False
use_norm (bool) β Whether to include norm in the block.
use_skip (bool) β Whether to add skip connections in the block.
norm_type (str) β One of βclnβ, βglnβ
Example
>>> x = torch.randn(10, 100, 64) >>> block = SBTransformerBlock_wnormandskip(1, 64, 8) >>> x = block(x) >>> x.shape torch.Size([10, 100, 64])
- class speechbrain.lobes.models.resepformer.ResourceEfficientSeparationPipeline(input_size, hidden_size, output_size, dropout=0.0, num_blocks=2, segment_size=20, bidirectional=True, mem_type='av', norm_type='gln', seg_model=None, mem_model=None)[source]ο
Bases:
Module
Resource Efficient Separation Pipeline Used for RE-SepFormer and SkiM
Note: This implementation is a generalization of the ESPNET implementation of SkiM
- Parameters:
input_size (int) β Dimension of the input feature. Input shape should be (batch, length, input_size)
hidden_size (int) β Dimension of the hidden state.
output_size (int) β Dimension of the output size.
dropout (float) β Dropout ratio. Default is 0.
num_blocks (int) β Number of basic SkiM blocks
segment_size (int) β Segmentation size for splitting long features
bidirectional (bool) β Whether the RNN layers are bidirectional.
mem_type (str) β βhcβ, βhβ, βcβ, βidβ or None. This controls whether the hidden (or cell) state of SegLSTM will be processed by MemLSTM. In βidβ mode, both the hidden and cell states will be identically returned. When mem_type is None, the MemLSTM will be removed.
norm_type (str) β One of gln or cln cln is for causal implementation.
seg_model (class) β The model that processes the within segment elements
mem_model (class) β The memory model that ensures continuity between the segments
Example
>>> x = torch.randn(10, 100, 64) >>> seg_mdl = SBTransformerBlock_wnormandskip(1, 64, 8) >>> mem_mdl = SBTransformerBlock_wnormandskip(1, 64, 8) >>> resepf_pipeline = ResourceEfficientSeparationPipeline(64, 64, 128, seg_model=seg_mdl, mem_model=mem_mdl) >>> out = resepf_pipeline.forward(x) >>> out.shape torch.Size([10, 100, 128])
- forward(input)[source]ο
The forward function of the ResourceEfficientSeparationPipeline
This takes in a tensor of size [B, (S*K), D]
- Parameters:
input (torch.Tensor) β
Tensor shape [B, (S*K), D], where, B = Batchsize,
S = Number of chunks K = Chunksize D = number of features
- Returns:
output β The separated tensor.
- Return type:
torch.Tensor
- class speechbrain.lobes.models.resepformer.ResourceEfficientSeparator(input_dim: int, causal: bool = True, num_spk: int = 2, nonlinear: str = 'relu', layer: int = 3, unit: int = 512, segment_size: int = 20, dropout: float = 0.0, mem_type: str = 'hc', seg_model=None, mem_model=None)[source]ο
Bases:
Module
Resource Efficient Source Separator This is the class that implements RE-SepFormer
- Parameters:
input_dim (int) β Input feature dimension
causal (bool) β Whether the system is causal.
num_spk (int) β Number of target speakers.
nonlinear (class) β the nonlinear function for mask estimation, select from βreluβ, βtanhβ, βsigmoidβ
layer (int) β number of blocks. Default is 2 for RE-SepFormer.
unit (int) β Dimensionality of the hidden state.
segment_size (int) β Chunk size for splitting long features
dropout (float) β dropout ratio. Default is 0.
mem_type (str) β βhcβ, βhβ, βcβ, βidβ, βavβ or None. This controls whether a memory representation will be used to ensure continuity between segments. In βavβ mode, the summary state is is calculated by simply averaging over the time dimension of each segment In βidβ mode, both the hidden and cell states will be identically returned. When mem_type is None, the memory model will be removed.
seg_model (class) β The model that processes the within segment elements
mem_model (class) β The memory model that ensures continuity between the segments
Example
>>> x = torch.randn(10, 64, 100) >>> seg_mdl = SBTransformerBlock_wnormandskip(1, 64, 8) >>> mem_mdl = SBTransformerBlock_wnormandskip(1, 64, 8) >>> resepformer = ResourceEfficientSeparator(64, num_spk=3, mem_type='av', seg_model=seg_mdl, mem_model=mem_mdl) >>> out = resepformer.forward(x) >>> out.shape torch.Size([3, 10, 64, 100])