speechbrain.nnet.quantisers moduleο
Gumbel Softmax implementation with multiple groups possible.
- Authors
Rudolf A. Braun 2022
Summaryο
Classes:
Vector quantization using gumbel softmax. |
|
Vector quantization using a projection and a randomly initialised codebook this is useful for models like BEST-RQ for instance. |
Referenceο
- class speechbrain.nnet.quantisers.GumbelVectorQuantizer(input_dim, num_vars, temp_tuple, groups, vq_dim)[source]ο
Bases:
ModuleVector quantization using gumbel softmax. Copied from fairseq implementation. :param input_dim: Input dimension (channels). :type input_dim: int :param num_vars: Number of quantized vectors per group. :type num_vars: int :param temp_tuple: Temperature for training. this should be a tuple of 3 elements: (start, stop, decay factor). :type temp_tuple: float :param groups: Number of groups for vector quantization. :type groups: int :param vq_dim: Dimensionality of the resulting quantized vector. :type vq_dim: int
Example
>>> quantiser = GumbelVectorQuantizer( ... 128, ... 100, ... ( ... 2.0, ... 0.25, ... 0.999995, ... ), ... 2, ... 50, ... ) >>> inputs = torch.rand(10, 12, 128) >>> output = quantiser(inputs) >>> output["x"].shape torch.Size([10, 12, 50])
- class speechbrain.nnet.quantisers.RandomProjectionQuantizer(input_dim, cb_dim, cb_vocab)[source]ο
Bases:
ModuleVector quantization using a projection and a randomly initialised codebook this is useful for models like BEST-RQ for instance.
The output is the indices of the closest code in the codebook for each time step of the input.
ref: https://arxiv.org/pdf/2202.01855
- Parameters:
Example
>>> quantiser = RandomProjectionQuantizer(16, 16, 32) >>> inputs = torch.rand(10, 12, 16) >>> output = quantiser(inputs) >>> output.shape torch.Size([10, 12])