speechbrain.decoders.scorer moduleο
Token scorer abstraction and specifications.
- Authors:
Adel Moumen 2022, 2023
Sung-Lin Yeh 2021
Summaryο
Classes:
A scorer abstraction intended for inheritance by other scoring approaches used in beam search. |
|
A scorer abstraction to be inherited by other scoring approaches for beam search. |
|
A wrapper of CTCPrefixScore based on the BaseScorerInterface. |
|
A coverage penalty scorer to prevent looping of hyps, where |
|
A wrapper of HuggingFace's TransformerLM based on the BaseRescorerInterface. |
|
KenLM N-gram scorer. |
|
A length rewarding scorer. |
|
A wrapper of RNNLM based on the BaseRescorerInterface. |
|
A wrapper of RNNLM based on BaseScorerInterface. |
|
Builds rescorer instance for beamsearch. |
|
Builds scorer instance for beamsearch. |
|
A wrapper of TransformerLM based on the BaseRescorerInterface. |
|
A wrapper of TransformerLM based on BaseScorerInterface. |
Referenceο
- class speechbrain.decoders.scorer.BaseScorerInterface[source]ο
Bases:
object
A scorer abstraction to be inherited by other scoring approaches for beam search.
A scorer is a module that scores tokens in vocabulary based on the current timestep input and the previous scorer states. It can be used to score on full vocabulary set (i.e., full scorers) or a pruned set of tokens (i.e. partial scorers) to prevent computation overhead. In the latter case, the partial scorers will be called after the full scorers. It will only scores the top-k candidates (i.e., pruned set of tokens) extracted from the full scorers. The top-k candidates are extracted based on the beam size and the scorer_beam_scale such that the number of candidates is int(beam_size * scorer_beam_scale). It can be very useful when the full scorers are computationally expensive (e.g., KenLM scorer).
Inherit this class to implement your own scorer compatible with speechbrain.decoders.seq2seq.S2SBeamSearcher().
- See:
speechbrain.decoders.scorer.CTCPrefixScorer
speechbrain.decoders.scorer.RNNLMScorer
speechbrain.decoders.scorer.TransformerLMScorer
speechbrain.decoders.scorer.KenLMScorer
speechbrain.decoders.scorer.CoverageScorer
speechbrain.decoders.scorer.LengthScorer
- score(inp_tokens, memory, candidates, attn)[source]ο
This method scores the new beams based on the information of the current timestep.
A score is a tensor of shape (batch_size x beam_size, vocab_size). It is the log probability of the next token given the current timestep input and the previous scorer states.
It can be used to score on pruned top-k candidates to prevent computation overhead, or on full vocabulary set when candidates is None.
- Parameters:
inp_tokens (torch.Tensor) β The input tensor of the current timestep.
memory (No limit) β The scorer states for this timestep.
candidates (torch.Tensor) β (batch_size x beam_size, scorer_beam_size). The top-k candidates to be scored after the full scorers. If None, scorers will score on full vocabulary set.
attn (torch.Tensor) β The attention weight to be used in CoverageScorer or CTCScorer.
- Returns:
torch.Tensor β (batch_size x beam_size, vocab_size), Scores for the next tokens.
memory (No limit) β The memory variables input for this timestep.
- permute_mem(memory, index)[source]ο
This method permutes the scorer memory to synchronize the memory index with the current output and perform batched beam search.
- Parameters:
memory (No limit) β The memory variables input for this timestep.
index (torch.Tensor) β (batch_size, beam_size). The index of the previous path.
- reset_mem(x, enc_lens)[source]ο
This method should implement the resetting of memory variables for the scorer.
- Parameters:
x (torch.Tensor) β The precomputed encoder states to be used when decoding. (ex. the encoded speech representation to be attended).
enc_lens (torch.Tensor) β The speechbrain-style relative length.
- class speechbrain.decoders.scorer.CTCScorer(ctc_fc, blank_index, eos_index, ctc_window_size=0)[source]ο
Bases:
BaseScorerInterface
A wrapper of CTCPrefixScore based on the BaseScorerInterface.
This Scorer is used to provides the CTC label-synchronous scores of the next input tokens. The implementation is based on https://www.merl.com/publications/docs/TR2017-190.pdf.
- See:
speechbrain.decoders.scorer.CTCPrefixScore
- Parameters:
ctc_fc (torch.nn.Module) β A output linear layer for ctc.
blank_index (int) β The index of the blank token.
eos_index (int) β The index of the end-of-sequence (eos) token.
ctc_window_size (int) β Compute the ctc scores over the time frames using windowing based on attention peaks. If 0, no windowing applied. (default: 0)
Example
>>> import torch >>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.transformer.TransformerASR import TransformerASR >>> from speechbrain.decoders import S2STransformerBeamSearcher, CTCScorer, ScorerBuilder >>> batch_size=8 >>> n_channels=6 >>> input_size=40 >>> d_model=128 >>> tgt_vocab=140 >>> src = torch.rand([batch_size, n_channels, input_size]) >>> tgt = torch.randint(0, tgt_vocab, [batch_size, n_channels]) >>> net = TransformerASR( ... tgt_vocab, input_size, d_model, 8, 1, 1, 1024, activation=torch.nn.GELU ... ) >>> ctc_lin = Linear(input_shape=(1, 40, d_model), n_neurons=tgt_vocab) >>> lin = Linear(input_shape=(1, 40, d_model), n_neurons=tgt_vocab) >>> eos_index = 2 >>> ctc_scorer = CTCScorer( ... ctc_fc=ctc_lin, ... blank_index=0, ... eos_index=eos_index, ... ) >>> scorer = ScorerBuilder( ... full_scorers=[ctc_scorer], ... weights={'ctc': 1.0} ... ) >>> searcher = S2STransformerBeamSearcher( ... modules=[net, lin], ... bos_index=1, ... eos_index=eos_index, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... using_eos_threshold=False, ... beam_size=7, ... temperature=1.15, ... scorer=scorer ... ) >>> enc, dec = net.forward(src, tgt) >>> hyps, _, _, _ = searcher(enc, torch.ones(batch_size))
- score(inp_tokens, memory, candidates, attn)[source]ο
This method scores the new beams based on the CTC scores computed over the time frames.
- See:
speechbrain.decoders.scorer.CTCPrefixScore
- Parameters:
inp_tokens (torch.Tensor) β The input tensor of the current timestep.
memory (No limit) β The scorer states for this timestep.
candidates (torch.Tensor) β (batch_size x beam_size, scorer_beam_size). The top-k candidates to be scored after the full scorers. If None, scorers will score on full vocabulary set.
attn (torch.Tensor) β The attention weight to be used in CoverageScorer or CTCScorer.
- Returns:
scores (torch.Tensor)
memory
- permute_mem(memory, index)[source]ο
This method permutes the scorer memory to synchronize the memory index with the current output and perform batched CTC beam search.
- Parameters:
memory (No limit) β The memory variables input for this timestep.
index (torch.Tensor) β (batch_size, beam_size). The index of the previous path.
- Returns:
r, psi
- Return type:
see
ctc_score.permute_mem
- reset_mem(x, enc_lens)[source]ο
This method implement the resetting of memory variables for the CTC scorer.
- Parameters:
x (torch.Tensor) β The precomputed encoder states to be used when decoding. (ex. the encoded speech representation to be attended).
enc_lens (torch.Tensor) β The speechbrain-style relative length.
- class speechbrain.decoders.scorer.RNNLMScorer(language_model, temperature=1.0)[source]ο
Bases:
BaseScorerInterface
A wrapper of RNNLM based on BaseScorerInterface.
The RNNLMScorer is used to provide the RNNLM scores of the next input tokens based on the current timestep input and the previous scorer states.
- Parameters:
language_model (torch.nn.Module) β A RNN-based language model.
temperature (float) β Temperature factor applied to softmax. It changes the probability distribution, being softer when T>1 and sharper with T<1. (default: 1.0)
Example
>>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.RNNLM import RNNLM >>> from speechbrain.nnet.RNN import AttentionalRNNDecoder >>> from speechbrain.decoders import S2SRNNBeamSearcher, RNNLMScorer, ScorerBuilder >>> input_size=17 >>> vocab_size=11 >>> emb = torch.nn.Embedding( ... embedding_dim=input_size, ... num_embeddings=vocab_size, ... ) >>> d_model=7 >>> dec = AttentionalRNNDecoder( ... rnn_type="gru", ... attn_type="content", ... hidden_size=3, ... attn_dim=3, ... num_layers=1, ... enc_dim=d_model, ... input_size=input_size, ... ) >>> n_channels=3 >>> seq_lin = Linear(input_shape=[d_model, n_channels], n_neurons=vocab_size) >>> lm_weight = 0.4 >>> lm_model = RNNLM( ... embedding_dim=d_model, ... output_neurons=vocab_size, ... dropout=0.0, ... rnn_neurons=128, ... dnn_neurons=64, ... return_hidden=True, ... ) >>> rnnlm_scorer = RNNLMScorer( ... language_model=lm_model, ... temperature=1.25, ... ) >>> scorer = ScorerBuilder( ... full_scorers=[rnnlm_scorer], ... weights={'rnnlm': lm_weight} ... ) >>> beam_size=5 >>> searcher = S2SRNNBeamSearcher( ... embedding=emb, ... decoder=dec, ... linear=seq_lin, ... bos_index=1, ... eos_index=2, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... topk=2, ... using_eos_threshold=False, ... beam_size=beam_size, ... temperature=1.25, ... scorer=scorer ... ) >>> batch_size=2 >>> enc = torch.rand([batch_size, n_channels, d_model]) >>> wav_len = torch.ones([batch_size]) >>> hyps, _, _, _ = searcher(enc, wav_len)
- score(inp_tokens, memory, candidates, attn)[source]ο
This method scores the new beams based on the RNNLM scores computed over the previous tokens.
- Parameters:
inp_tokens (torch.Tensor) β The input tensor of the current timestep.
memory (No limit) β The scorer states for this timestep.
candidates (torch.Tensor) β (batch_size x beam_size, scorer_beam_size). The top-k candidates to be scored after the full scorers. If None, scorers will score on full vocabulary set.
attn (torch.Tensor) β The attention weight to be used in CoverageScorer or CTCScorer.
- Returns:
log_probs (torch.Tensor) β Output probabilities.
hs (torch.Tensor) β LM hidden states.
- permute_mem(memory, index)[source]ο
This method permutes the scorer memory to synchronize the memory index with the current output and perform batched beam search.
- Parameters:
memory (No limit) β The memory variables input for this timestep.
index (torch.Tensor) β (batch_size, beam_size). The index of the previous path.
- Return type:
memory
- reset_mem(x, enc_lens)[source]ο
This method implement the resetting of memory variables for the RNNLM scorer.
- Parameters:
x (torch.Tensor) β The precomputed encoder states to be used when decoding. (ex. the encoded speech representation to be attended).
enc_lens (torch.Tensor) β The speechbrain-style relative length.
- class speechbrain.decoders.scorer.TransformerLMScorer(language_model, temperature=1.0)[source]ο
Bases:
BaseScorerInterface
A wrapper of TransformerLM based on BaseScorerInterface.
The TransformerLMScorer is used to provide the TransformerLM scores of the next input tokens based on the current timestep input and the previous scorer states.
- Parameters:
language_model (torch.nn.Module) β A Transformer-based language model.
temperature (float) β Temperature factor applied to softmax. It changes the probability distribution, being softer when T>1 and sharper with T<1. (default: 1.0)
Example
>>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.transformer.TransformerASR import TransformerASR >>> from speechbrain.lobes.models.transformer.TransformerLM import TransformerLM >>> from speechbrain.decoders import S2STransformerBeamSearcher, TransformerLMScorer, CTCScorer, ScorerBuilder >>> input_size=17 >>> vocab_size=11 >>> d_model=128 >>> net = TransformerASR( ... tgt_vocab=vocab_size, ... input_size=input_size, ... d_model=d_model, ... nhead=8, ... num_encoder_layers=1, ... num_decoder_layers=1, ... d_ffn=256, ... activation=torch.nn.GELU ... ) >>> lm_model = TransformerLM( ... vocab=vocab_size, ... d_model=d_model, ... nhead=8, ... num_encoder_layers=1, ... num_decoder_layers=0, ... d_ffn=256, ... activation=torch.nn.GELU, ... ) >>> n_channels=6 >>> ctc_lin = Linear(input_size=d_model, n_neurons=vocab_size) >>> seq_lin = Linear(input_size=d_model, n_neurons=vocab_size) >>> eos_index = 2 >>> ctc_scorer = CTCScorer( ... ctc_fc=ctc_lin, ... blank_index=0, ... eos_index=eos_index, ... ) >>> transformerlm_scorer = TransformerLMScorer( ... language_model=lm_model, ... temperature=1.15, ... ) >>> ctc_weight_decode=0.4 >>> lm_weight=0.6 >>> scorer = ScorerBuilder( ... full_scorers=[transformerlm_scorer, ctc_scorer], ... weights={'transformerlm': lm_weight, 'ctc': ctc_weight_decode} ... ) >>> beam_size=5 >>> searcher = S2STransformerBeamSearcher( ... modules=[net, seq_lin], ... bos_index=1, ... eos_index=eos_index, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... using_eos_threshold=False, ... beam_size=beam_size, ... temperature=1.15, ... scorer=scorer ... ) >>> batch_size=2 >>> wav_len = torch.ones([batch_size]) >>> src = torch.rand([batch_size, n_channels, input_size]) >>> tgt = torch.randint(0, vocab_size, [batch_size, n_channels]) >>> enc, dec = net.forward(src, tgt) >>> hyps, _, _, _ = searcher(enc, wav_len)
- score(inp_tokens, memory, candidates, attn)[source]ο
This method scores the new beams based on the TransformerLM scores computed over the previous tokens.
- Parameters:
inp_tokens (torch.Tensor) β The input tensor of the current timestep.
memory (No limit) β The scorer states for this timestep.
candidates (torch.Tensor) β (batch_size x beam_size, scorer_beam_size). The top-k candidates to be scored after the full scorers. If None, scorers will score on full vocabulary set.
attn (torch.Tensor) β The attention weight to be used in CoverageScorer or CTCScorer.
- Returns:
log_probs (torch.Tensor)
memory
- permute_mem(memory, index)[source]ο
This method permutes the scorer memory to synchronize the memory index with the current output and perform batched beam search.
- Parameters:
memory (No limit) β The memory variables input for this timestep.
index (torch.Tensor) β (batch_size, beam_size). The index of the previous path.
- Return type:
memory
- reset_mem(x, enc_lens)[source]ο
This method implement the resetting of memory variables for the RNNLM scorer.
- Parameters:
x (torch.Tensor) β The precomputed encoder states to be used when decoding. (ex. the encoded speech representation to be attended).
enc_lens (torch.Tensor) β The speechbrain-style relative length.
- class speechbrain.decoders.scorer.KenLMScorer(lm_path, vocab_size, token_list)[source]ο
Bases:
BaseScorerInterface
KenLM N-gram scorer.
This scorer is based on KenLM, which is a fast and efficient N-gram language model toolkit. It is used to provide the n-gram scores of the next input tokens.
This scorer is dependent on the KenLM package. It can be installed with the following command:
> pip install https://github.com/kpu/kenlm/archive/master.zip
Note: The KenLM scorer is computationally expensive. It is recommended to use it as a partial scorer to score on the top-k candidates instead of the full vocabulary set.
- Parameters:
Example
# >>> from speechbrain.nnet.linear import Linear # >>> from speechbrain.nnet.RNN import AttentionalRNNDecoder # >>> from speechbrain.decoders import S2SRNNBeamSearcher, KenLMScorer, ScorerBuilder # >>> input_size=17 # >>> vocab_size=11 # >>> lm_path=βpath/to/kenlm_model.arpaβ # or .bin # >>> token_list=[β<pad>β, β<bos>β, β<eos>β, βaβ, βbβ, βcβ, βdβ, βeβ, βfβ, βgβ, βhβ, βiβ] # >>> emb = torch.nn.Embedding( # β¦ embedding_dim=input_size, # β¦ num_embeddings=vocab_size, # β¦ ) # >>> d_model=7 # >>> dec = AttentionalRNNDecoder( # β¦ rnn_type=βgruβ, # β¦ attn_type=βcontentβ, # β¦ hidden_size=3, # β¦ attn_dim=3, # β¦ num_layers=1, # β¦ enc_dim=d_model, # β¦ input_size=input_size, # β¦ ) # >>> n_channels=3 # >>> seq_lin = Linear(input_shape=[d_model, n_channels], n_neurons=vocab_size) # >>> kenlm_weight = 0.4 # >>> kenlm_model = KenLMScorer( # β¦ lm_path=lm_path, # β¦ vocab_size=vocab_size, # β¦ token_list=token_list, # β¦ ) # >>> scorer = ScorerBuilder( # β¦ full_scorers=[kenlm_model], # β¦ weights={βkenlmβ: kenlm_weight} # β¦ ) # >>> beam_size=5 # >>> searcher = S2SRNNBeamSearcher( # β¦ embedding=emb, # β¦ decoder=dec, # β¦ linear=seq_lin, # β¦ bos_index=1, # β¦ eos_index=2, # β¦ min_decode_ratio=0.0, # β¦ max_decode_ratio=1.0, # β¦ topk=2, # β¦ using_eos_threshold=False, # β¦ beam_size=beam_size, # β¦ temperature=1.25, # β¦ scorer=scorer # β¦ ) # >>> batch_size=2 # >>> enc = torch.rand([batch_size, n_channels, d_model]) # >>> wav_len = torch.ones([batch_size]) # >>> hyps, _, _, _ = searcher(enc, wav_len)
- score(inp_tokens, memory, candidates, attn)[source]ο
This method scores the new beams based on the n-gram scores.
- Parameters:
inp_tokens (torch.Tensor) β The input tensor of the current timestep.
memory (No limit) β The scorer states for this timestep.
candidates (torch.Tensor) β (batch_size x beam_size, scorer_beam_size). The top-k candidates to be scored after the full scorers. If None, scorers will score on full vocabulary set.
attn (torch.Tensor) β The attention weight to be used in CoverageScorer or CTCScorer.
- Returns:
scores (torch.Tensor)
(new_memory, new_scoring_table) (tuple)
- permute_mem(memory, index)[source]ο
This method permutes the scorer memory to synchronize the memory index with the current output and perform batched beam search.
- Parameters:
memory (No limit) β The memory variables input for this timestep.
index (torch.Tensor) β (batch_size, beam_size). The index of the previous path.
- Returns:
state (torch.Tensor)
scoring_table (torch.Tensor)
- reset_mem(x, enc_lens)[source]ο
This method implement the resetting of memory variables for the KenLM scorer.
- Parameters:
x (torch.Tensor) β The precomputed encoder states to be used when decoding. (ex. the encoded speech representation to be attended).
enc_lens (torch.Tensor) β The speechbrain-style relative length.
- class speechbrain.decoders.scorer.CoverageScorer(vocab_size, threshold=0.5)[source]ο
Bases:
BaseScorerInterface
A coverage penalty scorer to prevent looping of hyps, where
`coverage`
is the cumulative attention probability vector. Reference: https://arxiv.org/pdf/1612.02695.pdf,- Parameters:
Example
>>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.RNNLM import RNNLM >>> from speechbrain.nnet.RNN import AttentionalRNNDecoder >>> from speechbrain.decoders import S2SRNNBeamSearcher, RNNLMScorer, CoverageScorer, ScorerBuilder >>> input_size=17 >>> vocab_size=11 >>> emb = torch.nn.Embedding( ... num_embeddings=vocab_size, ... embedding_dim=input_size ... ) >>> d_model=7 >>> dec = AttentionalRNNDecoder( ... rnn_type="gru", ... attn_type="content", ... hidden_size=3, ... attn_dim=3, ... num_layers=1, ... enc_dim=d_model, ... input_size=input_size, ... ) >>> n_channels=3 >>> seq_lin = Linear(input_shape=[d_model, n_channels], n_neurons=vocab_size) >>> lm_weight = 0.4 >>> coverage_penalty = 1.0 >>> lm_model = RNNLM( ... embedding_dim=d_model, ... output_neurons=vocab_size, ... dropout=0.0, ... rnn_neurons=128, ... dnn_neurons=64, ... return_hidden=True, ... ) >>> rnnlm_scorer = RNNLMScorer( ... language_model=lm_model, ... temperature=1.25, ... ) >>> coverage_scorer = CoverageScorer(vocab_size=vocab_size) >>> scorer = ScorerBuilder( ... full_scorers=[rnnlm_scorer, coverage_scorer], ... weights={'rnnlm': lm_weight, 'coverage': coverage_penalty} ... ) >>> beam_size=5 >>> searcher = S2SRNNBeamSearcher( ... embedding=emb, ... decoder=dec, ... linear=seq_lin, ... bos_index=1, ... eos_index=2, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... topk=2, ... using_eos_threshold=False, ... beam_size=beam_size, ... temperature=1.25, ... scorer=scorer ... ) >>> batch_size=2 >>> enc = torch.rand([batch_size, n_channels, d_model]) >>> wav_len = torch.ones([batch_size]) >>> hyps, _, _, _ = searcher(enc, wav_len)
- score(inp_tokens, coverage, candidates, attn)[source]ο
This method scores the new beams based on the Coverage scorer.
- Parameters:
inp_tokens (torch.Tensor) β The input tensor of the current timestep.
coverage (No limit) β The scorer states for this timestep.
candidates (torch.Tensor) β (batch_size x beam_size, scorer_beam_size). The top-k candidates to be scored after the full scorers. If None, scorers will score on full vocabulary set.
attn (torch.Tensor) β The attention weight to be used in CoverageScorer or CTCScorer.
- Returns:
score (torch.Tensor)
coverage
- permute_mem(coverage, index)[source]ο
This method permutes the scorer memory to synchronize the memory index with the current output and perform batched beam search.
- Parameters:
coverage (No limit) β The memory variables input for this timestep.
index (torch.Tensor) β (batch_size, beam_size). The index of the previous path.
- Return type:
coverage
- reset_mem(x, enc_lens)[source]ο
This method implement the resetting of memory variables for the RNNLM scorer.
- Parameters:
x (torch.Tensor) β The precomputed encoder states to be used when decoding. (ex. the encoded speech representation to be attended).
enc_lens (torch.Tensor) β The speechbrain-style relative length.
- class speechbrain.decoders.scorer.LengthScorer(vocab_size)[source]ο
Bases:
BaseScorerInterface
A length rewarding scorer.
The LengthScorer is used to provide the length rewarding scores. It is used to prevent the beam search from favoring short hypotheses.
Note: length_normalization is not compatible with this scorer. Make sure to set is to False when using LengthScorer.
- Parameters:
vocab_size (int) β The total number of tokens.
Example
>>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.RNNLM import RNNLM >>> from speechbrain.nnet.RNN import AttentionalRNNDecoder >>> from speechbrain.decoders import S2SRNNBeamSearcher, RNNLMScorer, CoverageScorer, ScorerBuilder >>> input_size=17 >>> vocab_size=11 >>> emb = torch.nn.Embedding( ... num_embeddings=vocab_size, ... embedding_dim=input_size ... ) >>> d_model=7 >>> dec = AttentionalRNNDecoder( ... rnn_type="gru", ... attn_type="content", ... hidden_size=3, ... attn_dim=3, ... num_layers=1, ... enc_dim=d_model, ... input_size=input_size, ... ) >>> n_channels=3 >>> seq_lin = Linear(input_shape=[d_model, n_channels], n_neurons=vocab_size) >>> lm_weight = 0.4 >>> length_weight = 1.0 >>> lm_model = RNNLM( ... embedding_dim=d_model, ... output_neurons=vocab_size, ... dropout=0.0, ... rnn_neurons=128, ... dnn_neurons=64, ... return_hidden=True, ... ) >>> rnnlm_scorer = RNNLMScorer( ... language_model=lm_model, ... temperature=1.25, ... ) >>> length_scorer = LengthScorer(vocab_size=vocab_size) >>> scorer = ScorerBuilder( ... full_scorers=[rnnlm_scorer, length_scorer], ... weights={'rnnlm': lm_weight, 'length': length_weight} ... ) >>> beam_size=5 >>> searcher = S2SRNNBeamSearcher( ... embedding=emb, ... decoder=dec, ... linear=seq_lin, ... bos_index=1, ... eos_index=2, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... topk=2, ... using_eos_threshold=False, ... beam_size=beam_size, ... temperature=1.25, ... length_normalization=False, ... scorer=scorer ... ) >>> batch_size=2 >>> enc = torch.rand([batch_size, n_channels, d_model]) >>> wav_len = torch.ones([batch_size]) >>> hyps, _, _, _ = searcher(enc, wav_len)
- score(inp_tokens, memory, candidates, attn)[source]ο
This method scores the new beams based on the Length scorer.
- Parameters:
inp_tokens (torch.Tensor) β The input tensor of the current timestep.
memory (No limit) β The scorer states for this timestep.
candidates (torch.Tensor) β (batch_size x beam_size, scorer_beam_size). The top-k candidates to be scored after the full scorers. If None, scorers will score on full vocabulary set.
attn (torch.Tensor) β The attention weight to be used in CoverageScorer or CTCScorer.
- Returns:
torch.Tensor β Scores
None
- class speechbrain.decoders.scorer.ScorerBuilder(weights={}, full_scorers=[], partial_scorers=[], scorer_beam_scale=2)[source]ο
Bases:
object
Builds scorer instance for beamsearch.
The ScorerBuilder class is responsible for building a scorer instance for beam search. It takes weights for full and partial scorers, as well as instances of full and partial scorer classes. It combines the scorers based on the weights specified and provides methods for scoring tokens, permuting scorer memory, and resetting scorer memory.
This is the class to be used for building scorer instances for beam search.
See speechbrain.decoders.seq2seq.S2SBeamSearcher()
- Parameters:
weights (dict) β Weights of full/partial scorers specified.
full_scorers (list) β Scorers that score on full vocabulary set.
partial_scorers (list) β Scorers that score on pruned tokens to prevent computation overhead. Partial scoring is performed after full scorers.
scorer_beam_scale (float) β The scale decides the number of pruned tokens for partial scorers: int(beam_size * scorer_beam_scale).
Example
>>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.transformer.TransformerASR import TransformerASR >>> from speechbrain.lobes.models.transformer.TransformerLM import TransformerLM >>> from speechbrain.decoders import S2STransformerBeamSearcher, TransformerLMScorer, CoverageScorer, CTCScorer, ScorerBuilder >>> input_size=17 >>> vocab_size=11 >>> d_model=128 >>> net = TransformerASR( ... tgt_vocab=vocab_size, ... input_size=input_size, ... d_model=d_model, ... nhead=8, ... num_encoder_layers=1, ... num_decoder_layers=1, ... d_ffn=256, ... activation=torch.nn.GELU ... ) >>> lm_model = TransformerLM( ... vocab=vocab_size, ... d_model=d_model, ... nhead=8, ... num_encoder_layers=1, ... num_decoder_layers=0, ... d_ffn=256, ... activation=torch.nn.GELU, ... ) >>> n_channels=6 >>> ctc_lin = Linear(input_size=d_model, n_neurons=vocab_size) >>> seq_lin = Linear(input_size=d_model, n_neurons=vocab_size) >>> eos_index = 2 >>> ctc_scorer = CTCScorer( ... ctc_fc=ctc_lin, ... blank_index=0, ... eos_index=eos_index, ... ) >>> transformerlm_scorer = TransformerLMScorer( ... language_model=lm_model, ... temperature=1.15, ... ) >>> coverage_scorer = CoverageScorer(vocab_size=vocab_size) >>> ctc_weight_decode=0.4 >>> lm_weight=0.6 >>> coverage_penalty = 1.0 >>> scorer = ScorerBuilder( ... full_scorers=[transformerlm_scorer, coverage_scorer], ... partial_scorers=[ctc_scorer], ... weights={'transformerlm': lm_weight, 'ctc': ctc_weight_decode, 'coverage': coverage_penalty} ... ) >>> beam_size=5 >>> searcher = S2STransformerBeamSearcher( ... modules=[net, seq_lin], ... bos_index=1, ... eos_index=eos_index, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... using_eos_threshold=False, ... beam_size=beam_size, ... topk=3, ... temperature=1.15, ... scorer=scorer ... ) >>> batch_size=2 >>> wav_len = torch.ones([batch_size]) >>> src = torch.rand([batch_size, n_channels, input_size]) >>> tgt = torch.randint(0, vocab_size, [batch_size, n_channels]) >>> enc, dec = net.forward(src, tgt) >>> hyps, _, _, _ = searcher(enc, wav_len)
- score(inp_tokens, memory, attn, log_probs, beam_size)[source]ο
This method scores tokens in vocabulary based on defined full scorers and partial scorers. Scores will be added to the log probs for beamsearch.
- Parameters:
inp_tokens (torch.Tensor) β See BaseScorerInterface().
memory (dict[str, scorer memory]) β The states of scorers for this timestep.
attn (torch.Tensor) β See BaseScorerInterface().
log_probs (torch.Tensor) β (batch_size x beam_size, vocab_size). The log probs at this timestep.
beam_size (int) β The beam size.
- Returns:
log_probs (torch.Tensor) β (batch_size x beam_size, vocab_size). Log probs updated by scorers.
new_memory (dict[str, scorer memory]) β The updated states of scorers.
- class speechbrain.decoders.scorer.BaseRescorerInterface[source]ο
Bases:
BaseScorerInterface
A scorer abstraction intended for inheritance by other scoring approaches used in beam search.
In this approach, a neural network is employed to assign scores to potential text transcripts. The beam search decoding process produces a collection of the top K hypotheses. These candidates are subsequently sent to a language model (LM) for ranking. The ranking is carried out by the LM, which assigns a score to each candidate.
The score is computed as follows:
score = beam_search_score + lm_weight * rescorer_score
- See:
speechbrain.decoders.scorer.RNNLMRescorer
speechbrain.decoders.scorer.TransformerLMRescorer
speechbrain.decoders.scorer.HuggingFaceLMRescorer
- normalize_text(text)[source]ο
This method should implement the normalization of the text before scoring.
- preprocess_func(hyps)[source]ο
This method should implement the preprocessing of the hypotheses before scoring.
- class speechbrain.decoders.scorer.RNNLMRescorer(language_model, tokenizer, device='cuda', temperature=1.0, bos_index=0, eos_index=0, pad_index=0)[source]ο
Bases:
BaseRescorerInterface
A wrapper of RNNLM based on the BaseRescorerInterface.
- Parameters:
language_model (torch.nn.Module) β A RNN-based language model.
tokenizer (SentencePieceProcessor) β A SentencePiece tokenizer.
device (str) β The device to move the scorer to.
temperature (float) β Temperature factor applied to softmax. It changes the probability distribution, being softer when T>1 and sharper with T<1. (default: 1.0)
bos_index (int) β The index of the beginning-of-sequence (bos) token.
eos_index (int) β The index of the end-of-sequence (eos) token.
pad_index (int) β The index of the padding token.
Note
This class is intended to be used with a pretrained TransformerLM model. Please see: https://huggingface.co/speechbrain/asr-crdnn-rnnlm-librispeech
By default, this model is using SentencePiece tokenizer.
Example
>>> import torch >>> from sentencepiece import SentencePieceProcessor >>> from speechbrain.lobes.models.RNNLM import RNNLM >>> from speechbrain.utils.parameter_transfer import Pretrainer >>> source = "speechbrain/asr-crdnn-rnnlm-librispeech" >>> lm_model_path = source + "/lm.ckpt" >>> tokenizer_path = source + "/tokenizer.ckpt" >>> # define your tokenizer and RNNLM from the HF hub >>> tokenizer = SentencePieceProcessor() >>> lm_model = RNNLM( ... output_neurons = 1000, ... embedding_dim = 128, ... activation = torch.nn.LeakyReLU, ... dropout = 0.0, ... rnn_layers = 2, ... rnn_neurons = 2048, ... dnn_blocks = 1, ... dnn_neurons = 512, ... return_hidden = True, ... ) >>> pretrainer = Pretrainer( ... collect_in = getfixture("tmp_path"), ... loadables = { ... "lm" : lm_model, ... "tokenizer" : tokenizer, ... }, ... paths = { ... "lm" : lm_model_path, ... "tokenizer" : tokenizer_path, ... }) >>> _ = pretrainer.collect_files() >>> pretrainer.load_collected() >>> from speechbrain.decoders.scorer import RNNLMRescorer, RescorerBuilder >>> rnnlm_rescorer = RNNLMRescorer( ... language_model = lm_model, ... tokenizer = tokenizer, ... temperature = 1.0, ... bos_index = 0, ... eos_index = 0, ... pad_index = 0, ... ) >>> # Define a rescorer builder >>> rescorer = RescorerBuilder( ... rescorers=[rnnlm_rescorer], ... weights={"rnnlm":1.0} ... ) >>> # topk hyps >>> topk_hyps = [["HELLO", "HE LLO", "H E L L O"]] >>> topk_scores = [[-2, -2, -2]] >>> rescored_hyps, rescored_scores = rescorer.rescore(topk_hyps, topk_scores) >>> # NOTE: the returned hypotheses are already sorted by score. >>> rescored_hyps [['HELLO', 'H E L L O', 'HE LLO']] >>> # NOTE: as we are returning log-probs, the more it is closer to 0, the better. >>> rescored_scores [[-17.863974571228027, -25.12890625, -26.075977325439453]]
- normalize_text(text)[source]ο
This method should implement the normalization of the text before scoring.
Default to uppercasing the text because the (current) language models are trained on LibriSpeech which is all uppercase.
- to_device(device=None)[source]ο
This method moves the scorer to a device.
If device is None, the scorer is moved to the default device provided in the constructor.
- Parameters:
device (str) β The device to move the scorer to.
- class speechbrain.decoders.scorer.TransformerLMRescorer(language_model, tokenizer, device='cuda', temperature=1.0, bos_index=0, eos_index=0, pad_index=0)[source]ο
Bases:
BaseRescorerInterface
A wrapper of TransformerLM based on the BaseRescorerInterface.
- Parameters:
language_model (torch.nn.Module) β A Transformer-based language model.
tokenizer (SentencePieceProcessor) β A SentencePiece tokenizer.
device (str) β The device to move the scorer to.
temperature (float) β Temperature factor applied to softmax. It changes the probability distribution, being softer when T>1 and sharper with T<1. (default: 1.0)
bos_index (int) β The index of the beginning-of-sequence (bos) token.
eos_index (int) β The index of the end-of-sequence (eos) token.
pad_index (int) β The index of the padding token.
Note
This class is intended to be used with a pretrained TransformerLM model. Please see: https://huggingface.co/speechbrain/asr-transformer-transformerlm-librispeech
By default, this model is using SentencePiece tokenizer.
Example
>>> import torch >>> from sentencepiece import SentencePieceProcessor >>> from speechbrain.lobes.models.transformer.TransformerLM import TransformerLM >>> from speechbrain.utils.parameter_transfer import Pretrainer >>> source = "speechbrain/asr-transformer-transformerlm-librispeech" >>> lm_model_path = source + "/lm.ckpt" >>> tokenizer_path = source + "/tokenizer.ckpt" >>> tokenizer = SentencePieceProcessor() >>> lm_model = TransformerLM( ... vocab=5000, ... d_model=768, ... nhead=12, ... num_encoder_layers=12, ... num_decoder_layers=0, ... d_ffn=3072, ... dropout=0.0, ... activation=torch.nn.GELU, ... normalize_before=False, ... ) >>> pretrainer = Pretrainer( ... collect_in = getfixture("tmp_path"), ... loadables={ ... "lm": lm_model, ... "tokenizer": tokenizer, ... }, ... paths={ ... "lm": lm_model_path, ... "tokenizer": tokenizer_path, ... } ... ) >>> _ = pretrainer.collect_files() >>> pretrainer.load_collected() >>> from speechbrain.decoders.scorer import TransformerLMRescorer, RescorerBuilder >>> transformerlm_rescorer = TransformerLMRescorer( ... language_model=lm_model, ... tokenizer=tokenizer, ... temperature=1.0, ... bos_index=1, ... eos_index=2, ... pad_index=0, ... ) >>> rescorer = RescorerBuilder( ... rescorers=[transformerlm_rescorer], ... weights={"transformerlm": 1.0} ... ) >>> topk_hyps = [["HELLO", "HE LLO", "H E L L O"]] >>> topk_scores = [[-2, -2, -2]] >>> rescored_hyps, rescored_scores = rescorer.rescore(topk_hyps, topk_scores) >>> # NOTE: the returned hypotheses are already sorted by score. >>> rescored_hyps [["HELLO", "HE L L O", "HE LLO"]] >>> # NOTE: as we are returning log-probs, the more it is closer to 0, the better. >>> rescored_scores [[-17.863974571228027, -25.12890625, -26.075977325439453]]
- normalize_text(text)[source]ο
This method should implement the normalization of the text before scoring.
Default to uppercasing the text because the language models are trained on LibriSpeech.
- to_device(device=None)[source]ο
This method moves the scorer to a device.
If device is None, the scorer is moved to the default device provided in the constructor.
This method is dynamically called in the recipes when the stage is equal to TEST.
- Parameters:
device (str) β The device to move the scorer to.
- class speechbrain.decoders.scorer.HuggingFaceLMRescorer(model_name, device='cuda')[source]ο
Bases:
BaseRescorerInterface
A wrapper of HuggingFaceβs TransformerLM based on the BaseRescorerInterface.
- Parameters:
Example
>>> from speechbrain.decoders.scorer import HuggingFaceLMRescorer, RescorerBuilder >>> source = "gpt2-medium" >>> huggingfacelm_rescorer = HuggingFaceLMRescorer( ... model_name=source, ... ) >>> rescorer = RescorerBuilder( ... rescorers=[huggingfacelm_rescorer], ... weights={"huggingfacelm": 1.0} ... ) >>> topk_hyps = [["Hello everyone.", "Hell o every one.", "Hello every one"]] >>> topk_scores = [[-2, -2, -2]] >>> rescored_hyps, rescored_scores = rescorer.rescore(topk_hyps, topk_scores) >>> # NOTE: the returned hypotheses are already sorted by score. >>> rescored_hyps [['Hello everyone.', 'Hello every one', 'Hell o every one.']] >>> # NOTE: as we are returning log-probs, the more it is closer to 0, the better. >>> rescored_scores [[-20.03631591796875, -27.615638732910156, -42.662353515625]]
- to_device(device=None)[source]ο
This method moves the scorer to a device.
If device is None, the scorer is moved to the default device provided in the constructor.
This method is dynamically called in the recipes when the stage is equal to TEST.
- Parameters:
device (str) β The device to move the scorer to.
- normalize_text(text)[source]ο
This method should implement the normalization of the text before scoring.
- class speechbrain.decoders.scorer.RescorerBuilder(weights={}, rescorers=[])[source]ο
Bases:
object
Builds rescorer instance for beamsearch.
The RescorerBuilder class is responsible for building a scorer instance for beam search. It takes weights and rescorers classes. It combines the scorers based on the weights specified and provides methods for rescoring text.
This is the class to be used for building rescorer instances for beam search.
- Parameters: