speechbrain.wordemb.util module

Utilities for word embeddings

Authors * Artem Ploujnikov 2021

Summary

Functions:

expand_to_chars

Expands word embeddings to a sequence of character embeddings, assigning each character the word embedding of the word to which it belongs

Reference

speechbrain.wordemb.util.expand_to_chars(emb, seq, seq_len, word_separator)[source]

Expands word embeddings to a sequence of character embeddings, assigning each character the word embedding of the word to which it belongs

Parameters

emb (torch.Tensor) – a tensor of word embeddings
seq (torch.Tensor) – a tensor of character embeddings
seq_len (torch.Tensor) – a tensor of character embedding lengths
word_separator (torch.Tensor) – the word separator being used

Returns

char_word_emb – a combined character + word embedding tensor

Return type

torch.Tensor

Example

>>> import torch
>>> emb = torch.tensor(
...     [[[1., 2., 3.],
...       [3., 1., 2.],
...       [0., 0., 0.]],
...      [[1., 3., 2.],
...       [3., 2., 1.],
...       [2., 3., 1.]]]
... )
>>> seq = torch.tensor(
...     [[1, 2, 0, 2, 1, 0],
...      [1, 0, 1, 2, 0, 2]]
... )
>>> seq_len = torch.tensor([4, 5])
>>> word_separator = 0
>>> expand_to_chars(emb, seq, seq_len, word_separator)
tensor([[[1., 2., 3.],
         [1., 2., 3.],
         [0., 0., 0.],
         [3., 1., 2.],
         [3., 1., 2.],
         [0., 0., 0.]],

        [[1., 3., 2.],
         [0., 0., 0.],
         [3., 2., 1.],
         [3., 2., 1.],
         [0., 0., 0.],
         [2., 3., 1.]]])