Optimization¶
Optimizers¶
FirstOrderOptimizer¶
-
class FirstOrderOptimizer¶
An abstract base class for first-order gradient-based optimizers.
Any derived class must implement the step() function. Example usage:
SGDOptimizer optimizer(model.parameters(), 1e-1); auto loss = model(data); loss.backward(); optimizer.step(); optimizer.zeroGrad();
Subclassed by fl::AMSgradOptimizer, fl::AdadeltaOptimizer, fl::AdagradOptimizer, fl::AdamOptimizer, fl::NAGOptimizer, fl::NovogradOptimizer, fl::RMSPropOptimizer, fl::SGDOptimizer
Public Functions
-
FirstOrderOptimizer(const std::vector<Variable> ¶meters, double learningRate)¶
The
FirstOrderOptimizer
base class constructor.- Parameters:
parameters – The parameters from e.g.
model.parameters()
learningRate – The learning rate.
-
virtual void step() = 0¶
-
inline double getLr() const¶
Get the learning rate.
-
inline void setLr(double lr)¶
Set the learning rate.
-
virtual void zeroGrad()¶
Zero the gradients for all the parameters being optimized.
Typically this will be called after every call to step().
-
virtual std::string prettyString() const = 0¶
Generates a stringified representation of the optimizer.
- Returns:
a string containing the optimizer label
-
virtual ~FirstOrderOptimizer() = default¶
-
FirstOrderOptimizer(const std::vector<Variable> ¶meters, double learningRate)¶
AdamOptimizer¶
-
class AdamOptimizer : public fl::FirstOrderOptimizer¶
An implementation of the Adam optimizer.
For more details see the paper Adam: A Method for Stochastic Optimization.
Public Functions
-
AdamOptimizer(const std::vector<Variable> ¶meters, float learningRate, float beta1 = 0.9, float beta2 = 0.999, float epsilon = 1e-8, float weightDecay = 0)¶
Construct an Adam optimizer.
- Parameters:
parameters – The parameters from e.g.
model.parameters()
.learningRate – The learning rate.
beta1 – Adam hyperparameter \( \beta_1 \).
beta2 – Adam hyperparameter \( \beta_2 \).
epsilon – A small value used for numerical stability.
weightDecay – The amount of L2 weight decay to use for all the parameters.
-
virtual void step() override¶
-
AdamOptimizer(const std::vector<Variable> ¶meters, float learningRate, float beta1 = 0.9, float beta2 = 0.999, float epsilon = 1e-8, float weightDecay = 0)¶
RMSPropOptimizer¶
-
class RMSPropOptimizer : public fl::FirstOrderOptimizer¶
An implementation of the RMSProp optimizer.
For more details see Geoff Hinton’s lecture slides and https://arxiv.org/pdf/1308.0850v5.pdf.
Public Functions
-
RMSPropOptimizer(const std::vector<Variable> ¶meters, float learningRate, float rho = 0.99, float epsilon = 1e-8, float weightDecay = 0, bool use_first = false)¶
Construct an RMSProp optimizer.
- Parameters:
parameters – The parameters from e.g.
model.parameters()
.learningRate – The learning rate.
rho – The weight in the term \( rho * m + (1-rho) * g^2 \).
epsilon – A small value used for numerical stability.
weightDecay – The amount of L2 weight decay to use for all the parameters.
use_first – Use the first moment in the update. When
true
keep a running mean of the gradient and subtract it from the running mean of the squared gradients.
-
virtual void step() override¶
-
RMSPropOptimizer(const std::vector<Variable> ¶meters, float learningRate, float rho = 0.99, float epsilon = 1e-8, float weightDecay = 0, bool use_first = false)¶
SGDOptimizer¶
-
class SGDOptimizer : public fl::FirstOrderOptimizer¶
A Stochastic Gradient Descent (SGD) optimizer.
At its most basic this implements the update
\[ w = w - lr * g \]When momentum is used the update becomes
\[\begin{split} v &= \rho * v + g \\ w &= w - lr * v \end{split}\]Reference for SGD and Momentum: http://cs231n.github.io/neural-networks-3/#sgd
Public Functions
-
SGDOptimizer(const std::vector<Variable> ¶meters, float learningRate, float momentum = 0, float weightDecay = 0, bool useNesterov = false)¶
SGDOptimizer constructor.
- Parameters:
parameters – The parameters from e.g.
model.parameters()
learningRate – The learning rate.
momentum – The momentum.
weightDecay – The amount of L2 weight decay to use for all the parameters.
useNesterov – Whether or not to use nesterov style momentum.
-
virtual void step() override¶
-
SGDOptimizer(const std::vector<Variable> ¶meters, float learningRate, float momentum = 0, float weightDecay = 0, bool useNesterov = false)¶