speechbrain.integrations.huggingface.wordemb.util module

Utilities for word embeddings

Authors * Artem Ploujnikov 2021

Summary

Functions:

expand_to_chars

Expands word embeddings to a sequence of character embeddings, assigning each character the word embedding of the word to which it belongs

Reference

speechbrain.integrations.huggingface.wordemb.util.expand_to_chars(emb, seq, seq_len, word_separator)[source]

Expands word embeddings to a sequence of character embeddings, assigning each character the word embedding of the word to which it belongs

Parameters:

emb (torch.Tensor) – a tensor of word embeddings
seq (torch.Tensor) – a tensor of character embeddings
seq_len (torch.Tensor) – a tensor of character embedding lengths
word_separator (torch.Tensor) – the word separator being used

Returns:

char_word_emb – a combined character + word embedding tensor

Return type:

torch.Tensor

Example

>>> import torch
>>> emb = torch.tensor(
...     [
...         [[1.0, 2.0, 3.0], [3.0, 1.0, 2.0], [0.0, 0.0, 0.0]],
...         [[1.0, 3.0, 2.0], [3.0, 2.0, 1.0], [2.0, 3.0, 1.0]],
...     ]
... )
>>> seq = torch.tensor([[1, 2, 0, 2, 1, 0], [1, 0, 1, 2, 0, 2]])
>>> seq_len = torch.tensor([4, 5])
>>> word_separator = 0
>>> expand_to_chars(emb, seq, seq_len, word_separator)
tensor([[[1., 2., 3.],
         [1., 2., 3.],
         [0., 0., 0.],
         [3., 1., 2.],
         [3., 1., 2.],
         [0., 0., 0.]],

        [[1., 3., 2.],
         [0., 0., 0.],
         [3., 2., 1.],
         [3., 2., 1.],
         [0., 0., 0.],
         [2., 3., 1.]]])