Optimization¶
Optimizers¶
FirstOrderOptimizer¶
-
class
FirstOrderOptimizer
¶ An abstract base class for first-order gradient-based optimizers.
Any derived class must implement the step() function. Example usage:
SGDOptimizer optimizer(model.parameters(), 1e-1); auto loss = model(data); loss.backward(); optimizer.step(); optimizer.zeroGrad();
Subclassed by fl::AdadeltaOptimizer, fl::AdagradOptimizer, fl::AdamOptimizer, fl::AMSgradOptimizer, fl::NAGOptimizer, fl::NovogradOptimizer, fl::RMSPropOptimizer, fl::SGDOptimizer
Public Functions
-
FirstOrderOptimizer
(const std::vector<Variable> ¶meters, double learningRate)¶ The
FirstOrderOptimizer
base class constructor.- Parameters
parameters
: The parameters from e.g.model.parameters()
learningRate
: The learning rate.
-
virtual void
step
() = 0¶
-
double
getLr
() const¶ Get the learning rate.
-
void
setLr
(double lr)¶ Set the learning rate.
-
virtual void
zeroGrad
()¶ Zero the gradients for all the parameters being optimized.
Typically this will be called after every call to step().
-
virtual std::string
prettyString
() const = 0¶ Generates a stringified representation of the optimizer.
- Return
a string containing the optimizer label
-
virtual
~FirstOrderOptimizer
()¶
-
AdamOptimizer¶
-
class
AdamOptimizer
: public fl::FirstOrderOptimizer¶ An implementation of the Adam optimizer.
For more details see the paper Adam: A Method for Stochastic Optimization.
Public Functions
-
AdamOptimizer
(const std::vector<Variable> ¶meters, float learningRate, float beta1 = 0.9, float beta2 = 0.999, float epsilon = 1e-8, float weightDecay = 0)¶ Construct an Adam optimizer.
- Parameters
parameters
: The parameters from e.g.model.parameters()
.learningRate
: The learning rate.beta1
: Adam hyperparameter \( \beta_1 \).beta2
: Adam hyperparameter \( \beta_2 \).epsilon
: A small value used for numerical stability.weightDecay
: The amount of L2 weight decay to use for all the parameters.
-
void
step
()¶
-
RMSPropOptimizer¶
-
class
RMSPropOptimizer
: public fl::FirstOrderOptimizer¶ An implementation of the RMSProp optimizer.
For more details see Geoff Hinton’s lecture slides and https://arxiv.org/pdf/1308.0850v5.pdf.
Public Functions
-
RMSPropOptimizer
(const std::vector<Variable> ¶meters, float learningRate, float rho = 0.99, float epsilon = 1e-8, float weightDecay = 0, bool use_first = false)¶ Construct an RMSProp optimizer.
- Parameters
parameters
: The parameters from e.g.model.parameters()
.learningRate
: The learning rate.rho
: The weight in the term \( rho * m + (1-rho) * g^2 \).epsilon
: A small value used for numerical stability.weightDecay
: The amount of L2 weight decay to use for all the parameters.use_first
: Use the first moment in the update. Whentrue
keep a running mean of the gradient and subtract it from the running mean of the squared gradients.
-
void
step
()¶
-
SGDOptimizer¶
-
class
SGDOptimizer
: public fl::FirstOrderOptimizer¶ A Stochastic Gradient Descent (SGD) optimizer.
At its most basic this implements the update
\[ w = w - lr * g \]When momentum is used the update becomes
\[\begin{split} v &= \rho * v + g \\ w &= w - lr * v \end{split}\]Reference for SGD and Momentum: http://cs231n.github.io/neural-networks-3/#sgd
Public Functions
-
SGDOptimizer
(const std::vector<Variable> ¶meters, float learningRate, float momentum = 0, float weightDecay = 0, bool useNesterov = false)¶ SGDOptimizer constructor.
- Parameters
parameters
: The parameters from e.g.model.parameters()
learningRate
: The learning rate.momentum
: The momentum.weightDecay
: The amount of L2 weight decay to use for all the parameters.useNesterov
: Whether or not to use nesterov style momentum.
-
void
step
()¶
-