Optimization¶

Optimizers¶

FirstOrderOptimizer¶

class FirstOrderOptimizer

An abstract base class for first-order gradient-based optimizers.

Any derived class must implement the step() function. Example usage:

SGDOptimizer optimizer(model.parameters(), 1e-1);
auto loss = model(data);
loss.backward();
optimizer.step();


Public Functions

FirstOrderOptimizer(const std::vector<Variable> &parameters, double learningRate)

The FirstOrderOptimizer base class constructor.

Parameters
• parameters: The parameters from e.g. model.parameters()

• learningRate: The learning rate.

virtual void step() = 0
double getLr() const

Get the learning rate.

void setLr(double lr)

Set the learning rate.

virtual void zeroGrad()

Zero the gradients for all the parameters being optimized.

Typically this will be called after every call to step().

virtual std::string prettyString() const = 0

Generates a stringified representation of the optimizer.

Return

a string containing the optimizer label

virtual ~FirstOrderOptimizer()

class AdamOptimizer : public fl::FirstOrderOptimizer

An implementation of the Adam optimizer.

For more details see the paper Adam: A Method for Stochastic Optimization.

Public Functions

AdamOptimizer(const std::vector<Variable> &parameters, float learningRate, float beta1 = 0.9, float beta2 = 0.999, float epsilon = 1e-8, float weightDecay = 0)

Parameters
• parameters: The parameters from e.g. model.parameters().

• learningRate: The learning rate.

• beta1: Adam hyperparameter $$\beta_1$$.

• beta2: Adam hyperparameter $$\beta_2$$.

• epsilon: A small value used for numerical stability.

• weightDecay: The amount of L2 weight decay to use for all the parameters.

void step()
std::string prettyString() const

Generates a stringified representation of the optimizer.

Return

a string containing the optimizer label

RMSPropOptimizer¶

class RMSPropOptimizer : public fl::FirstOrderOptimizer

An implementation of the RMSProp optimizer.

For more details see Geoff Hinton’s lecture slides and https://arxiv.org/pdf/1308.0850v5.pdf.

Public Functions

RMSPropOptimizer(const std::vector<Variable> &parameters, float learningRate, float rho = 0.99, float epsilon = 1e-8, float weightDecay = 0, bool use_first = false)

Construct an RMSProp optimizer.

Parameters
• parameters: The parameters from e.g. model.parameters().

• learningRate: The learning rate.

• rho: The weight in the term $$rho * m + (1-rho) * g^2$$.

• epsilon: A small value used for numerical stability.

• weightDecay: The amount of L2 weight decay to use for all the parameters.

• use_first: Use the first moment in the update. When true keep a running mean of the gradient and subtract it from the running mean of the squared gradients.

void step()
std::string prettyString() const

Generates a stringified representation of the optimizer.

Return

a string containing the optimizer label

SGDOptimizer¶

class SGDOptimizer : public fl::FirstOrderOptimizer

A Stochastic Gradient Descent (SGD) optimizer.

At its most basic this implements the update

$w = w - lr * g$

When momentum is used the update becomes

$\begin{split} v &= \rho * v + g \\ w &= w - lr * v \end{split}$

Reference for SGD and Momentum: http://cs231n.github.io/neural-networks-3/#sgd

Public Functions

SGDOptimizer(const std::vector<Variable> &parameters, float learningRate, float momentum = 0, float weightDecay = 0, bool useNesterov = false)

SGDOptimizer constructor.

Parameters
• parameters: The parameters from e.g. model.parameters()

• learningRate: The learning rate.

• momentum: The momentum.

• weightDecay: The amount of L2 weight decay to use for all the parameters.

• useNesterov: Whether or not to use nesterov style momentum.

void step()
std::string prettyString() const

Generates a stringified representation of the optimizer.

Return

a string containing the optimizer label