Optimization¶

Optimizers¶

FirstOrderOptimizer¶

class FirstOrderOptimizer¶

An abstract base class for first-order gradient-based optimizers.

Any derived class must implement the step() function. Example usage:

SGDOptimizer optimizer(model.parameters(), 1e-1);
auto loss = model(data);
loss.backward();
optimizer.step();
optimizer.zeroGrad();

Subclassed by fl::AdadeltaOptimizer, fl::AdagradOptimizer, fl::AdamOptimizer, fl::AMSgradOptimizer, fl::NAGOptimizer, fl::NovogradOptimizer, fl::RMSPropOptimizer, fl::SGDOptimizer

Public Functions

FirstOrderOptimizer(const std::vector<Variable> &parameters, double learningRate)¶

The FirstOrderOptimizer base class constructor.

Parameters

parameters: The parameters from e.g. model.parameters()
learningRate: The learning rate.

virtual void step() = 0¶

double getLr() const¶: Get the learning rate.

void setLr(double lr)¶: Set the learning rate.

virtual void zeroGrad()¶

Zero the gradients for all the parameters being optimized.

Typically this will be called after every call to step().

virtual std::string prettyString() const = 0¶

Generates a stringified representation of the optimizer.

Return: a string containing the optimizer label

virtual ~FirstOrderOptimizer()¶

AdamOptimizer¶

class AdamOptimizer : public fl::FirstOrderOptimizer ¶

An implementation of the Adam optimizer.

For more details see the paper Adam: A Method for Stochastic Optimization.

Public Functions

AdamOptimizer(const std::vector<Variable> &parameters, float learningRate, float beta1 = 0.9, float beta2 = 0.999, float epsilon = 1e-8, float weightDecay = 0)¶

Construct an Adam optimizer.

Parameters

parameters: The parameters from e.g. model.parameters().
learningRate: The learning rate.
beta1: Adam hyperparameter \( \beta_1 \).
beta2: Adam hyperparameter \( \beta_2 \).
epsilon: A small value used for numerical stability.
weightDecay: The amount of L2 weight decay to use for all the parameters.

void step()¶

std::string prettyString() const¶

Generates a stringified representation of the optimizer.

Return: a string containing the optimizer label

RMSPropOptimizer¶

class RMSPropOptimizer : public fl::FirstOrderOptimizer ¶

An implementation of the RMSProp optimizer.

For more details see Geoff Hinton’s lecture slides and https://arxiv.org/pdf/1308.0850v5.pdf.

Public Functions

RMSPropOptimizer(const std::vector<Variable> &parameters, float learningRate, float rho = 0.99, float epsilon = 1e-8, float weightDecay = 0, bool use_first = false)¶

Construct an RMSProp optimizer.

Parameters

parameters: The parameters from e.g. model.parameters().
learningRate: The learning rate.
rho: The weight in the term \( rho * m + (1-rho) * g^2 \).
epsilon: A small value used for numerical stability.
weightDecay: The amount of L2 weight decay to use for all the parameters.
use_first: Use the first moment in the update. When true keep a running mean of the gradient and subtract it from the running mean of the squared gradients.

void step()¶

std::string prettyString() const¶

Generates a stringified representation of the optimizer.

Return: a string containing the optimizer label

SGDOptimizer¶

class SGDOptimizer : public fl::FirstOrderOptimizer ¶

A Stochastic Gradient Descent (SGD) optimizer.

At its most basic this implements the update

\[ w = w - lr * g \]

When momentum is used the update becomes

\[\begin{split} v &= \rho * v + g \\ w &= w - lr * v \end{split}\]

Reference for SGD and Momentum: http://cs231n.github.io/neural-networks-3/#sgd

Public Functions

SGDOptimizer(const std::vector<Variable> &parameters, float learningRate, float momentum = 0, float weightDecay = 0, bool useNesterov = false)¶

SGDOptimizer constructor.

Parameters

parameters: The parameters from e.g. model.parameters()
learningRate: The learning rate.
momentum: The momentum.
weightDecay: The amount of L2 weight decay to use for all the parameters.
useNesterov: Whether or not to use nesterov style momentum.

void step()¶

std::string prettyString() const¶

Generates a stringified representation of the optimizer.

Return: a string containing the optimizer label