Optimization¶
Optimizers¶
FirstOrderOptimizer¶
-
class
FirstOrderOptimizer¶ An abstract base class for first-order gradient-based optimizers.
Any derived class must implement the step() function. Example usage:
SGDOptimizer optimizer(model.parameters(), 1e-1); auto loss = model(data); loss.backward(); optimizer.step(); optimizer.zeroGrad();
Subclassed by fl::AdadeltaOptimizer, fl::AdagradOptimizer, fl::AdamOptimizer, fl::AMSgradOptimizer, fl::NAGOptimizer, fl::NovogradOptimizer, fl::RMSPropOptimizer, fl::SGDOptimizer
Public Functions
-
FirstOrderOptimizer(const std::vector<Variable> ¶meters, double learningRate)¶ The
FirstOrderOptimizerbase class constructor.- Parameters
parameters: The parameters from e.g.model.parameters()learningRate: The learning rate.
-
virtual void
step() = 0¶
-
double
getLr() const¶ Get the learning rate.
-
void
setLr(double lr)¶ Set the learning rate.
-
virtual void
zeroGrad()¶ Zero the gradients for all the parameters being optimized.
Typically this will be called after every call to step().
-
virtual std::string
prettyString() const = 0¶ Generates a stringified representation of the optimizer.
- Return
a string containing the optimizer label
-
virtual
~FirstOrderOptimizer()¶
-
AdamOptimizer¶
-
class
AdamOptimizer: public fl::FirstOrderOptimizer¶ An implementation of the Adam optimizer.
For more details see the paper Adam: A Method for Stochastic Optimization.
Public Functions
-
AdamOptimizer(const std::vector<Variable> ¶meters, float learningRate, float beta1 = 0.9, float beta2 = 0.999, float epsilon = 1e-8, float weightDecay = 0)¶ Construct an Adam optimizer.
- Parameters
parameters: The parameters from e.g.model.parameters().learningRate: The learning rate.beta1: Adam hyperparameter \( \beta_1 \).beta2: Adam hyperparameter \( \beta_2 \).epsilon: A small value used for numerical stability.weightDecay: The amount of L2 weight decay to use for all the parameters.
-
void
step()¶
-
RMSPropOptimizer¶
-
class
RMSPropOptimizer: public fl::FirstOrderOptimizer¶ An implementation of the RMSProp optimizer.
For more details see Geoff Hinton’s lecture slides and https://arxiv.org/pdf/1308.0850v5.pdf.
Public Functions
-
RMSPropOptimizer(const std::vector<Variable> ¶meters, float learningRate, float rho = 0.99, float epsilon = 1e-8, float weightDecay = 0, bool use_first = false)¶ Construct an RMSProp optimizer.
- Parameters
parameters: The parameters from e.g.model.parameters().learningRate: The learning rate.rho: The weight in the term \( rho * m + (1-rho) * g^2 \).epsilon: A small value used for numerical stability.weightDecay: The amount of L2 weight decay to use for all the parameters.use_first: Use the first moment in the update. Whentruekeep a running mean of the gradient and subtract it from the running mean of the squared gradients.
-
void
step()¶
-
SGDOptimizer¶
-
class
SGDOptimizer: public fl::FirstOrderOptimizer¶ A Stochastic Gradient Descent (SGD) optimizer.
At its most basic this implements the update
\[ w = w - lr * g \]When momentum is used the update becomes
\[\begin{split} v &= \rho * v + g \\ w &= w - lr * v \end{split}\]Reference for SGD and Momentum: http://cs231n.github.io/neural-networks-3/#sgd
Public Functions
-
SGDOptimizer(const std::vector<Variable> ¶meters, float learningRate, float momentum = 0, float weightDecay = 0, bool useNesterov = false)¶ SGDOptimizer constructor.
- Parameters
parameters: The parameters from e.g.model.parameters()learningRate: The learning rate.momentum: The momentum.weightDecay: The amount of L2 weight decay to use for all the parameters.useNesterov: Whether or not to use nesterov style momentum.
-
void
step()¶
-