speechbrain.nnet.schedulers module

Schedulers for updating hyperparameters (such as learning rate).

Authors
  • Mirco Ravanelli 2020

  • Peter Plantinga 2020

  • Loren Lugosch 2020

Summary

Classes:

CyclicCosineScheduler

The is an implementation of the Cyclic-Cosine learning rate scheduler with warmup.

CyclicLRScheduler

This implements a cyclical learning rate policy (CLR).

LinearScheduler

Scheduler with linear annealing technique.

NewBobScheduler

Scheduler with new-bob technique, used for LR annealing.

NoamScheduler

The is an implementation of the transformer’s learning rate scheduler with warmup.

ReduceLROnPlateau

Learning rate scheduler which decreases the learning rate if the loss function of interest gets stuck on a plateau, or starts to increase.

StepScheduler

Learning rate scheduler with step annealing technique.

Functions:

update_learning_rate

Change the learning rate value within an optimizer.

Reference

speechbrain.nnet.schedulers.update_learning_rate(optimizer, new_lr, param_group=None)[source]

Change the learning rate value within an optimizer.

Parameters
  • optimizer (torch.optim object) – Updates the learning rate for this optimizer.

  • new_lr (float) – The new value to use for the learning rate.

  • param_group (list of int) – The param group indices to update. If not provided, all groups updated.

Example

>>> from torch.optim import SGD
>>> from speechbrain.nnet.linear import Linear
>>> model = Linear(n_neurons=10, input_size=10)
>>> optimizer = SGD(model.parameters(), lr=0.1)
>>> update_learning_rate(optimizer, 0.2)
>>> optimizer.param_groups[0]["lr"]
0.2
class speechbrain.nnet.schedulers.NewBobScheduler(initial_value, annealing_factor=0.5, improvement_threshold=0.0025, patient=0)[source]

Bases: object

Scheduler with new-bob technique, used for LR annealing.

The learning rate is annealed based on the validation performance. In particular: if (past_loss-current_loss)/past_loss< impr_threshold: lr=lr * annealing_factor.

Parameters
  • initial_value (float) – The initial hyperparameter value.

  • annealing_factor (float) – It is annealing factor used in new_bob strategy.

  • improvement_threshold (float) – It is the improvement rate between losses used to perform learning annealing in new_bob strategy.

  • patient (int) – When the annealing condition is violated patient times, the learning rate is finally reduced.

Example

>>> scheduler = NewBobScheduler(initial_value=1.0)
>>> scheduler(metric_value=10.0)
(1.0, 1.0)
>>> scheduler(metric_value=2.0)
(1.0, 1.0)
>>> scheduler(metric_value=2.5)
(1.0, 0.5)
__call__(metric_value)[source]

Returns the current and new value for the hyperparameter.

Parameters

metric_value (int) – A number for determining whether to change the hyperparameter value.

save(path)[source]
load(path, end_of_epoch=False, device=None)[source]
class speechbrain.nnet.schedulers.LinearScheduler(initial_value, final_value, epoch_count)[source]

Bases: object

Scheduler with linear annealing technique.

The learning rate linearly decays over the specified number of epochs.

Parameters
  • initial_value (float) – The value upon initialization.

  • final_value (float) – The value used when the epoch count reaches epoch_count - 1.

  • epoch_count (int) – Number of epochs.

Example

>>> scheduler = LinearScheduler(1.0, 0.0, 4)
>>> scheduler(current_epoch=1)
(1.0, 0.666...)
>>> scheduler(current_epoch=2)
(0.666..., 0.333...)
>>> scheduler(current_epoch=3)
(0.333..., 0.0)
>>> scheduler(current_epoch=4)
(0.0, 0.0)
__call__(current_epoch)[source]

Returns the current and new value for the hyperparameter.

Parameters

current_epoch (int) – Number of times the dataset has been iterated.

class speechbrain.nnet.schedulers.StepScheduler(initial_value, decay_factor=0.5, decay_drop=2)[source]

Bases: object

Learning rate scheduler with step annealing technique.

The hyperparameter’s value decays over the epochs with the selected epoch_decay factor.

value = init_value * decay_factor ^ floor((1 + epoch) / decay_drop)

Parameters
  • initial_value (float) – Initial value for the hyperparameter being updated.

  • decay_factor (float) – Factor multiplied with the initial_value

  • decay_drop (float) – Annealing factor (the decay of the hyperparameter value is faster with higher decay_drop values).

Example

>>> scheduler = StepScheduler(initial_value=1.0)
>>> scheduler(current_epoch=1)
(1.0, 0.5)
>>> scheduler(current_epoch=2)
(0.5, 0.5)
>>> scheduler(current_epoch=3)
(0.5, 0.25)
__call__(current_epoch)[source]

Returns current and new hyperparameter value.

Parameters

current_epoch (int) – Number of times the dataset has been iterated.

class speechbrain.nnet.schedulers.NoamScheduler(lr_initial, n_warmup_steps, model_size=None)[source]

Bases: object

The is an implementation of the transformer’s learning rate scheduler with warmup. Reference: https://arxiv.org/abs/1706.03762

Note: this scheduler anneals the lr at each update of the model’s weight, and n_steps must be saved for restarting.

Parameters
  • lr_initial (float) – Initial learning rate (i.e. the lr used at epoch 0).

  • n_warmup_steps (int) – numer of warm-up steps

Example

>>> from speechbrain.nnet.linear import Linear
>>> inp_tensor = torch.rand([1,660,3])
>>> model = Linear(input_size=3, n_neurons=4)
>>> optim = torch.optim.Adam(model.parameters(), lr=1)
>>> output = model(inp_tensor)
>>> scheduler =NoamScheduler(optim.param_groups[0]["lr"], 3)
>>> curr_lr,next_lr=scheduler(optim)
>>> optim.param_groups[0]["lr"]
0.33333333333333337
>>> curr_lr,next_lr=scheduler(optim)
>>> optim.param_groups[0]["lr"]
0.6666666666666667
>>> curr_lr,next_lr=scheduler(optim)
>>> optim.param_groups[0]["lr"]
1.0
__call__(opt)[source]
Parameters

opt (optimizer) – The optimizer to update using this scheduler.

Returns

  • current_lr (float) – The learning rate before the update.

  • lr (float) – The learning rate after the update.

save(path)[source]
load(path, end_of_epoch=False, device=None)[source]
class speechbrain.nnet.schedulers.CyclicCosineScheduler(n_warmup_steps, lr_initial=None, total_steps=100000)[source]

Bases: object

The is an implementation of the Cyclic-Cosine learning rate scheduler with warmup.

Reference: https://openreview.net/pdf?id=BJYwwY9ll

Note: this scheduler anneals the lr at each update of the model’s weight, and n_steps must be saved for restarting.

Parameters
  • lr_initial (float) – Initial learning rate (i.e. the lr used at epoch 0).

  • n_warmup_steps (int) – Number of warm up steps.

  • total_steps (int) – Total number of updating steps.

Example

>>> from speechbrain.nnet.linear import Linear
>>> inp_tensor = torch.rand([1,660,3])
>>> model = Linear(input_size=3, n_neurons=4)
>>> optim = torch.optim.Adam(model.parameters(), lr=1)
>>> output = model(inp_tensor)
>>> scheduler =CyclicCosineScheduler(3, optim.param_groups[0]["lr"])
>>> curr_lr,next_lr=scheduler(optim)
>>> optim.param_groups[0]["lr"]
0.9999999990130395
>>> curr_lr,next_lr=scheduler(optim)
>>> optim.param_groups[0]["lr"]
0.9999999997532598
>>> curr_lr,next_lr=scheduler(optim)
>>> optim.param_groups[0]["lr"]
1.0
__call__(opt)[source]
Parameters
  • opt (list of optimizers) – The optimizers to update using this scheduler.

  • current_epoch (int) – Number of times the dataset has been iterated.

  • current_loss (int) – A number for determining whether to change the learning rate.

Returns

  • current_lr (float) – The learning rate before the update.

  • lr (float) – The learning rate after the update.

save(path)[source]
load(path, end_of_epoch=False, device=None)[source]
class speechbrain.nnet.schedulers.ReduceLROnPlateau(lr_min=1e-08, factor=0.5, patience=2, dont_halve_until_epoch=65)[source]

Bases: object

Learning rate scheduler which decreases the learning rate if the loss function of interest gets stuck on a plateau, or starts to increase. The difference from NewBobLRScheduler is that, this one keeps a memory of the last step where do not observe improvement, and compares against that particular loss value as opposed to the most recent loss.

Parameters
  • lr_min (float) – The minimum allowable learning rate.

  • factor (float) – Factor with which to reduce the learning rate.

  • patience (int) – How many epochs to wait before reducing the learning rate.

Example

>>> from torch.optim import Adam
>>> from speechbrain.nnet.linear import Linear
>>> inp_tensor = torch.rand([1,660,3])
>>> model = Linear(n_neurons=10, input_size=3)
>>> optim = Adam(lr=1.0, params=model.parameters())
>>> output = model(inp_tensor)
>>> scheduler = ReduceLROnPlateau(0.25, 0.5, 2, 1)
>>> curr_lr,next_lr=scheduler([optim],current_epoch=1, current_loss=10.0)
>>> curr_lr,next_lr=scheduler([optim],current_epoch=2, current_loss=11.0)
>>> curr_lr,next_lr=scheduler([optim],current_epoch=3, current_loss=13.0)
>>> curr_lr,next_lr=scheduler([optim],current_epoch=4, current_loss=14.0)
>>> next_lr
0.5
__call__(optim_list, current_epoch, current_loss)[source]
Parameters
  • optim_list (list of optimizers) – The optimizers to update using this scheduler.

  • current_epoch (int) – Number of times the dataset has been iterated.

  • current_loss (int) – A number for determining whether to change the learning rate.

Returns

  • current_lr (float) – The learning rate before the update.

  • next_lr (float) – The learning rate after the update.

save(path)[source]
load(path, end_of_epoch=False, device=None)[source]
class speechbrain.nnet.schedulers.CyclicLRScheduler(base_lr=0.001, max_lr=0.006, step_size=2000.0, mode='triangular', gamma=1.0, scale_fn=None, scale_mode='cycle')[source]

Bases: object

This implements a cyclical learning rate policy (CLR). The method cycles the learning rate between two boundaries with some constant frequency, as detailed in this paper (https://arxiv.org/abs/1506.01186). The amplitude of the cycle can be scaled on a per-iteration or per-cycle basis.

This class has three built-in policies, as put forth in the paper. “triangular”:

A basic triangular cycle w/ no amplitude scaling.

“triangular2”:

A basic triangular cycle that scales initial amplitude by half each cycle.

“exp_range”:

A cycle that scales initial amplitude by gamma**(cycle iterations) at each cycle iteration.

For more detail, please see the reference paper.

Parameters
  • base_lr (float) – initial learning rate which is the lower boundary in the cycle.

  • max_lr (float) – upper boundary in the cycle. Functionally, it defines the cycle amplitude (max_lr - base_lr). The lr at any cycle is the sum of base_lr and some scaling of the amplitude; therefore max_lr may not actually be reached depending on scaling function.

  • step_size (int) – number of training iterations per half cycle. The authors suggest setting step_size 2-8 x training iterations in epoch.

  • mode (str) – one of {triangular, triangular2, exp_range}. Default ‘triangular’. Values correspond to policies detailed above. If scale_fn is not None, this argument is ignored.

  • gamma (float) – constant in ‘exp_range’ scaling function: gamma**(cycle iterations)

  • scale_fn (lambda function) – Custom scaling policy defined by a single argument lambda function, where 0 <= scale_fn(x) <= 1 for all x >= 0. mode parameter is ignored

  • scale_mode (str) – {‘cycle’, ‘iterations’}. Defines whether scale_fn is evaluated on cycle number or cycle iterations (training iterations since start of cycle). Default is ‘cycle’.

Example

>>> from speechbrain.nnet.linear import Linear
>>> inp_tensor = torch.rand([1,660,3])
>>> model = Linear(input_size=3, n_neurons=4)
>>> optim = torch.optim.Adam(model.parameters(), lr=1)
>>> output = model(inp_tensor)
>>> scheduler = CyclicLRScheduler(base_lr=0.1, max_lr=0.3, step_size=2)
>>> scheduler.on_batch_end(optim)
>>> optim.param_groups[0]["lr"]
0.2
>>> scheduler.on_batch_end(optim)
>>> optim.param_groups[0]["lr"]
0.3
>>> scheduler.on_batch_end(optim)
>>> optim.param_groups[0]["lr"]
0.2
clr(clr_iterations)[source]
on_batch_end(opt)[source]
Parameters

opt (optimizers) – The optimizers to update using this scheduler.

save(path)[source]
load(path, end_of_epoch=False, device=None)[source]