Optimization¶

Optimizers¶

FirstOrderOptimizer¶

class FirstOrderOptimizer¶

An abstract base class for first-order gradient-based optimizers.

Any derived class must implement the step() function. Example usage:

SGDOptimizer optimizer(model.parameters(), 1e-1);
auto loss = model(data);
loss.backward();
optimizer.step();
optimizer.zeroGrad();

Subclassed by fl::AMSgradOptimizer, fl::AdadeltaOptimizer, fl::AdagradOptimizer, fl::AdamOptimizer, fl::NAGOptimizer, fl::NovogradOptimizer, fl::RMSPropOptimizer, fl::SGDOptimizer

Public Functions

FirstOrderOptimizer(const std::vector<Variable> &parameters, double learningRate)¶

The FirstOrderOptimizer base class constructor.

Parameters:

parameters – The parameters from e.g. model.parameters()
learningRate – The learning rate.

virtual void step() = 0¶

inline double getLr() const¶: Get the learning rate.

inline void setLr(double lr)¶: Set the learning rate.

virtual void zeroGrad()¶

Zero the gradients for all the parameters being optimized.

Typically this will be called after every call to step().

virtual std::string prettyString() const = 0¶

Generates a stringified representation of the optimizer.

Returns:: a string containing the optimizer label

virtual ~FirstOrderOptimizer() = default¶

AdamOptimizer¶

class AdamOptimizer : public fl::FirstOrderOptimizer ¶

An implementation of the Adam optimizer.

For more details see the paper Adam: A Method for Stochastic Optimization.

Public Functions

AdamOptimizer(const std::vector<Variable> &parameters, float learningRate, float beta1 = 0.9, float beta2 = 0.999, float epsilon = 1e-8, float weightDecay = 0)¶

Construct an Adam optimizer.

Parameters:

parameters – The parameters from e.g. model.parameters().
learningRate – The learning rate.
beta1 – Adam hyperparameter \( \beta_1 \).
beta2 – Adam hyperparameter \( \beta_2 \).
epsilon – A small value used for numerical stability.
weightDecay – The amount of L2 weight decay to use for all the parameters.

virtual void step() override¶

virtual std::string prettyString() const override¶

Generates a stringified representation of the optimizer.

Returns:: a string containing the optimizer label

RMSPropOptimizer¶

class RMSPropOptimizer : public fl::FirstOrderOptimizer ¶

An implementation of the RMSProp optimizer.

For more details see Geoff Hinton’s lecture slides and https://arxiv.org/pdf/1308.0850v5.pdf.

Public Functions

RMSPropOptimizer(const std::vector<Variable> &parameters, float learningRate, float rho = 0.99, float epsilon = 1e-8, float weightDecay = 0, bool use_first = false)¶

Construct an RMSProp optimizer.

Parameters:

parameters – The parameters from e.g. model.parameters().
learningRate – The learning rate.
rho – The weight in the term \( rho * m + (1-rho) * g^2 \).
epsilon – A small value used for numerical stability.
weightDecay – The amount of L2 weight decay to use for all the parameters.
use_first – Use the first moment in the update. When true keep a running mean of the gradient and subtract it from the running mean of the squared gradients.

virtual void step() override¶

virtual std::string prettyString() const override¶

Generates a stringified representation of the optimizer.

Returns:: a string containing the optimizer label

SGDOptimizer¶

class SGDOptimizer : public fl::FirstOrderOptimizer ¶

A Stochastic Gradient Descent (SGD) optimizer.

At its most basic this implements the update

\[ w = w - lr * g \]

When momentum is used the update becomes

\[\begin{split} v &= \rho * v + g \\ w &= w - lr * v \end{split}\]

Reference for SGD and Momentum: http://cs231n.github.io/neural-networks-3/#sgd

Public Functions

SGDOptimizer(const std::vector<Variable> &parameters, float learningRate, float momentum = 0, float weightDecay = 0, bool useNesterov = false)¶

SGDOptimizer constructor.

Parameters:

parameters – The parameters from e.g. model.parameters()
learningRate – The learning rate.
momentum – The momentum.
weightDecay – The amount of L2 weight decay to use for all the parameters.
useNesterov – Whether or not to use nesterov style momentum.

virtual void step() override¶

virtual std::string prettyString() const override¶

Generates a stringified representation of the optimizer.

Returns:: a string containing the optimizer label