Modules¶
Containers¶
Module¶
-
class Module¶
An abstract computation unit capable of forward computation.
Also contains a collection of parameters that can be mutated, and will be serialized and deserialized with the module.
Subclassed by fl::BinaryModule, fl::Container, fl::Identity, fl::PositionEmbedding, fl::PrecisionCast, fl::RNN, fl::SinusoidalPositionEmbedding, fl::UnaryModule, fl::WeightNorm
Public Functions
-
std::vector<Variable> params() const¶
Gets the parameters of the module.
- Returns:
the modules parameters as a vector of
Variable
-
int numParamTensors() const¶
Gets the nunber of parameter tensors of the module.
- Returns:
the number of parameter tensors
-
virtual void train()¶
Switches the module to training mode.
Changes all parameters so that gradient calculation will be enabled for any calls to
forward
.
-
virtual void eval()¶
Switches the module to evaluation mode.
Changes all parameters so that gradient calculation will be disabled for any calls to
forward
.
-
Variable param(int position) const¶
Returns a module parameter given a particular position.
- Parameters:
position – the index of the requested parameter in
params_
- Returns:
a
Variable
tensor for the parameter at the requested position
-
virtual void setParams(const Variable &var, int position)¶
Sets a parameter at a specified position with a new, given one.
If the specified position is not valid (it is negative or greater than
params_.size() - 1
), then an error will be thrown. A new parameter will not be created at a specified index if out of bounds.- Parameters:
var – [in] the new replacement
Variable
position – The index of the parameter which will be replaced in
params_
-
virtual std::vector<Variable> copyParams() const¶
Copies the modules parameters, detaching from the computation graph.
- Returns:
a copy of the modules parameters as a vector of
Variable
-
void zeroGrad()¶
Clears references to gradient Variables for all parameters in the module.
-
virtual std::vector<Variable> forward(const std::vector<Variable> &inputs) = 0¶
Performs forward computation for the module, given some inputs.
- Parameters:
inputs – the values to compute forward computation for the module.
- Returns:
a vector of
Variable
tensors containing the result of the forward computation
-
std::vector<Variable> operator()(const std::vector<Variable> &inputs)¶
Overload for forward computation for the module.
- Parameters:
inputs – the values to compute forward computation for the module.
- Returns:
a vector of
Variable
tensors containing the result of the forward computation
-
virtual std::unique_ptr<Module> clone() const = 0¶
Clone the module via deep copy of its parameters and members.
- Returns:
A unique pointer of the cloned module.
-
virtual std::string prettyString() const = 0¶
Generates a stringified representation of the module.
- Returns:
a string containing the module label
-
virtual ~Module() = default¶
-
std::vector<Variable> params() const¶
-
class UnaryModule : public fl::Module¶
An extension of
Module
which supports only forward computation on a singleVariable
with a singleVariable
as output.For example,
Sigmoid
module can be derived fromUnaryModule
.Subclassed by fl::AdaptiveEmbedding, fl::AdaptiveSoftMax, fl::BatchNorm, fl::Conv2D, fl::Dropout, fl::ELU, fl::Embedding, fl::GatedLinearUnit, fl::HardTanh, fl::LayerNorm, fl::LeakyReLU, fl::Linear, fl::Log, fl::LogSoftmax, fl::Normalize, fl::PReLU, fl::Padding, fl::Pool2D, fl::RawWavSpecAugment, fl::ReLU, fl::ReLU6, fl::Reorder, fl::Sigmoid, fl::SpecAugment, fl::Swish, fl::Tanh, fl::ThresholdReLU, fl::Transform, fl::View
Public Functions
-
UnaryModule()¶
-
virtual std::vector<Variable> forward(const std::vector<Variable> &inputs) override¶
Performs forward computation for the module, given some inputs.
- Parameters:
inputs – the values to compute forward computation for the module.
- Returns:
a vector of
Variable
tensors containing the result of the forward computation
-
virtual ~UnaryModule() = default¶
-
UnaryModule()¶
-
class BinaryModule : public fl::Module¶
An extension of
Module
which supports only forward computation on a pair ofVariable
s with a singleVariable
as output.For example,
BinaryCrossEntropy
Loss can be derived fromBinaryModule
.Subclassed by fl::AdaptiveSoftMaxLoss, fl::BinaryCrossEntropy, fl::CategoricalCrossEntropy, fl::MeanAbsoluteError, fl::MeanSquaredError, fl::pkg::speech::ForceAlignmentCriterion, fl::pkg::speech::FullConnectionCriterion
Public Functions
-
BinaryModule()¶
-
virtual std::vector<Variable> forward(const std::vector<Variable> &inputs) override¶
Performs forward computation for the module, given some inputs.
- Parameters:
inputs – the values to compute forward computation for the module.
- Returns:
a vector of
Variable
tensors containing the result of the forward computation
-
virtual ~BinaryModule() = default¶
-
BinaryModule()¶
Container¶
-
class Container : public fl::Module¶
A computation unit capable of forward computation that contains a collection of multiple
Module
and their respective parameters.Subclassed by fl::Conformer, fl::Residual, fl::Sequential, fl::TDSBlock, fl::Transformer, fl::app::benchmark::AsrTransformer, fl::app::benchmark::LmTransformer, fl::pkg::speech::AttentionBase, fl::pkg::speech::SequenceCriterion, fl::pkg::vision::Detr, fl::pkg::vision::MultiheadAttention, fl::pkg::vision::PositionalEmbeddingSine, fl::pkg::vision::ResNetBlock, fl::pkg::vision::ResNetBlockFrozenBatchNorm, fl::pkg::vision::ResNetBottleneckBlock, fl::pkg::vision::ResNetBottleneckBlockFrozenBatchNorm, fl::pkg::vision::Resnet50Backbone, fl::pkg::vision::Transformer, fl::pkg::vision::TransformerBaseLayer, fl::pkg::vision::TransformerDecoder, fl::pkg::vision::TransformerDecoderLayer, fl::pkg::vision::TransformerEncoder, fl::pkg::vision::ViT, fl::pkg::vision::VisionTransformer
Public Functions
-
template<typename T>
inline void add(T &&module)¶ Adds a module to a
Container
by making a copy of the underlying module if an lvalue or moving it if and rvalue.- Parameters:
module – [in] the module to add.
-
template<typename T>
inline void add(std::unique_ptr<T> module)¶ Adds a module to a
Container
by moving it and taking ownership.- Parameters:
module – the module to add.
Adds a module to
modules_
, and adds parameters to the container’sparams_
.- Parameters:
module – the module to add.
-
ModulePtr module(int id) const¶
Returns a pointer to the module at the specified index in the container’s
modules_
.- Parameters:
id – the index of the module to return
- Returns:
a pointer to the requested module
-
std::vector<ModulePtr> modules() const¶
Returns pointers to each of
Module
in theContainer
.- Returns:
an ordered vector of pointers for each module.
-
virtual void setParams(const Variable &var, int position) override¶
Sets a parameter at a specified position with a new, given one.
If the specified position is not valid (it is negative or greater than
params_.size() - 1
), then an error will be thrown. A new parameter will not be created at a specified index if out of bounds.- Parameters:
var – the new replacement
Variable
position – The index of the parameter which will be replaced in
params_
-
virtual std::string prettyString() const override¶
Generates a stringified representation of the module.
- Returns:
a string containing the module label
-
template<typename T>
Sequential¶
-
class Sequential : public fl::Container¶
A
Container
representing an ordered sequence of modules, which is capable of forward computation through each of its modules, in order.Usage:
Sequential mySequential(); // Assume we've defined implemented three modules, mod1, mod2, mod3 mySequential.add(mod1); mySequential.add(mod2); mySequential.add(mod3); // Performing forward computation will forward through each `Module` in order auto result = mySequential.forward(myInput); // We can also inspect internal state assert(mySequential.modules().size() == 3); // true assert( mod1.params().size() + mod2.params.size() + mod3.params().size() == mySequential.params().size()); // true
Subclassed by fl::pkg::vision::ConvBnAct, fl::pkg::vision::ConvFrozenBatchNormActivation, fl::pkg::vision::MLP, fl::pkg::vision::ResNetBottleneckStage, fl::pkg::vision::ResNetBottleneckStageFrozenBatchNorm, fl::pkg::vision::ResNetStage, fl::pkg::vision::ResNetStageFrozenBatchNorm
Public Functions
-
Sequential()¶
-
virtual std::vector<Variable> forward(const std::vector<Variable> &input) override¶
Performs forward computation for the
Sequential
, callingforward
, in order, for eachModule
, and feeding the result as input to the nextModule
.
-
virtual std::string prettyString() const override¶
Generates a stringified representation of the
Sequential
by concatenating string representations for each containedModule
- Returns:
a string containing the module label
-
inline Sequential(const Sequential &other)¶
-
inline Sequential &operator=(const Sequential &other)¶
-
Sequential(Sequential &&other) = default¶
-
Sequential &operator=(Sequential &&other) = default¶
-
Sequential()¶
Layers¶
Activations¶
-
class Sigmoid : public fl::UnaryModule¶
Applies the sigmoid function element-wise to a
Variable
:\[\text{sigmoid}(x) = \frac{1}{1 + e^{-x}}\]Public Functions
-
virtual std::unique_ptr<Module> clone() const override¶
Clone the module via deep copy of its parameters and members.
- Returns:
A unique pointer of the cloned module.
-
virtual std::string prettyString() const override¶
Generates a stringified representation of the module.
- Returns:
a string containing the module label
-
virtual std::unique_ptr<Module> clone() const override¶
-
class Tanh : public fl::UnaryModule¶
Applies the hyperbolic tangent function element-wise to a
Variable
:\[\text{tanh}(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\]Public Functions
-
virtual std::unique_ptr<Module> clone() const override¶
Clone the module via deep copy of its parameters and members.
- Returns:
A unique pointer of the cloned module.
-
virtual std::string prettyString() const override¶
Generates a stringified representation of the module.
- Returns:
a string containing the module label
-
virtual std::unique_ptr<Module> clone() const override¶
-
class HardTanh : public fl::UnaryModule¶
Applies the hard-tanh function element-wise to a
Variable
:\[\begin{split}\text{HardTanh}(x) = \begin{cases} 1 & \text{ if } x > 1 \\ -1 & \text{ if } x < -1 \\ x & \text{ otherwise } \\ \end{cases} \end{split}\]Public Functions
-
virtual std::unique_ptr<Module> clone() const override¶
Clone the module via deep copy of its parameters and members.
- Returns:
A unique pointer of the cloned module.
-
virtual std::string prettyString() const override¶
Generates a stringified representation of the module.
- Returns:
a string containing the module label
-
virtual std::unique_ptr<Module> clone() const override¶
-
class ReLU : public fl::UnaryModule¶
Applies the rectified linear unit function element-wise to a
Variable
:\[ ReLU(x) = \max(0, x) \]Public Functions
-
virtual std::unique_ptr<Module> clone() const override¶
Clone the module via deep copy of its parameters and members.
- Returns:
A unique pointer of the cloned module.
-
virtual std::string prettyString() const override¶
Generates a stringified representation of the module.
- Returns:
a string containing the module label
-
virtual std::unique_ptr<Module> clone() const override¶
-
class LeakyReLU : public fl::UnaryModule¶
Applies the leaky rectified linear unit function from Maas et al (2013), Rectifier Nonlinearities Improve Neural Network Acoustic Models.
Applied function element-wise to a
Variable
:\[\begin{split} \text{LeakyRELU}(x) = \begin{cases} x, & \text{ if } x \geq 0 \\ \text{slope} \times x, & \text{ otherwise } \end{cases} \end{split}\]where \(\text{slope}\) is a constant by which the input will be multiplied if less than zero.
-
class PReLU : public fl::UnaryModule¶
Applies the pramaeterized rectified linear unit function from He et al (2015), Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.
Applied element-wise to a
Variable
, given some input size:\[\begin{split} \text{PReLU}(x) = \begin{cases} x, & \text{ if } x \geq 0 \\ \text{value} \times x, & \text{ otherwise } \end{cases} \end{split}\]where \(\text{value}\) is a learned parameter whose initialization can be tuned.
-
class ELU : public fl::UnaryModule¶
Applies the exponential linear unit function from Clevert et al (2015): Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs).
Applied element-wise to a
Variable
:\[\begin{split} \text{ELU}(x) = \begin{cases} x & \text{ if } x \geq 0 \\ \alpha \times (e^x - 1) & \text{ otherwise } \end{cases} \end{split}\]where \(\alpha\) is a tunable parameter.
-
class ThresholdReLU : public fl::UnaryModule¶
Applies the threshold rectified linear unit from Konda et al (2015): Zero-bias autoencoders and the benefits of co-adapting features.
Applied element-wise to a
Variable
:\[\begin{split} \text{ThresholdReLU}(x) = \begin{cases} x & \text{ if } x > \text{threshold} \\ 0 & \text{ otherwise } \end{cases} \end{split}\]where \(\text{threshold}\) is a tunable parameter.
-
class GatedLinearUnit : public fl::UnaryModule¶
Creates a Gated Linear Unit from Dauphin et al (2017): Language Modeling with Gated Convolutional Networks.
\[\text{GLU}(x) = x_i \otimes \sigma(x_j)\]where \(\otimes\) denotes the element-wise product \(x_i\) is the first half of the input, \(x_j\) is the second half, and \(\sigma(x)\) is the sigmoid function.
-
class LogSoftmax : public fl::UnaryModule¶
Applies the log softmax function to a tensor:
\[ \text{LogSoftmax}(x_i) = \log{\left (\frac{e^{x_i} }{ \sum_j e^{x_j}} \right)} \]
-
class Log : public fl::UnaryModule¶
Applies the natural logarithm element-wise to a
Variable
.Public Functions
-
virtual std::unique_ptr<Module> clone() const override¶
Clone the module via deep copy of its parameters and members.
- Returns:
A unique pointer of the cloned module.
-
virtual std::string prettyString() const override¶
Generates a stringified representation of the module.
- Returns:
a string containing the module label
-
virtual std::unique_ptr<Module> clone() const override¶
BatchNorm¶
-
class BatchNorm : public fl::UnaryModule¶
Applies Batch Normalization on a given input as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.
The operation implemented is:
\[out(x) = \frac{x - E[x]}{\sqrt{Var[x]+\epsilon}} \times \gamma + \beta \]where \(E[x]\) and \(Var[x]\) are the mean and variance of the input \(x\) calculated over the specified axis, \(\epsilon\) is a small value added to the variance to avoid divide-by-zero, and \(\gamma\) and \(\beta\) are learnable parameters for affine transformation.Subclassed by fl::FrozenBatchNorm
Public Functions
-
BatchNorm(int featAxis, int featSize, double momentum = 0.1, double eps = 1e-5, bool affine = true, bool trackStats = true)¶
Constructs a BatchNorm module.
- Parameters:
featAxis – the axis over which normalizationis performed
featSize – the size of the dimension along
featAxis
momentum – an exponential average factor used to compute running mean and variance.
\[ runningMean = runningMean \times (1-momentum) + newMean \times momentum \]If < 0, cumulative moving average is used.eps – \(\epsilon\)
affine – a boolean value that controls the learning of \(\gamma\) and \(\beta\). \(\gamma\) and \(\beta\) are set to 1, 0 respectively if set to
false
, or initialized as learnable parameters if set totrue
.trackStats – a boolean value that controls whether to track the running mean and variance while in train mode. If
false
, batch statistics are used to perform normalization in both train and eval mode.
-
BatchNorm(const std::vector<int> &featAxis, int featSize, double momentum = 0.1, double eps = 1e-5, bool affine = true, bool trackStats = true)¶
Constructs a BatchNorm module.
- Parameters:
featAxis – the axis over which normalization is performed
featSize – total dimension along
featAxis
. For example, to perform Temporal Batch Normalization on input of size [ \(L\), \(C\), \(N\)], usefeatAxis
= {1},featSize
= \(C\). To perform normalization per activation on input of size [ \(W\), \(H\), \(C\), \(N\)], usefeatAxis
= {0, 1, 2},featSize
= \(W \times H \times C\).momentum – an exponential average factor used to compute running mean and variance.
\[ runningMean = runningMean \times (1-momentum) + newMean \times momentum \]If < 0, cumulative moving average is used.eps – \(\epsilon\)
affine – a boolean value that controls the learning of \(\gamma\) and \(\beta\). \(\gamma\) and \(\beta\) are set to 1, 0 respectively if set to
false
, or initialized as learnable parameters if set totrue
.trackStats – a boolean value that controls whether to track the running mean and variance while in train mode. If
false
, batch statistics are used to perform normalization in both train and eval mode.
-
BatchNorm(const BatchNorm &other)¶
Constructs a BatchNorm module from another, performing a copy of the stats parameters.
- Parameters:
other – The BatchNorm module to copy from.
-
virtual std::unique_ptr<Module> clone() const override¶
Clone the module via deep copy of its parameters and members.
- Returns:
A unique pointer of the cloned module.
-
virtual std::string prettyString() const override¶
Generates a stringified representation of the module.
- Returns:
a string containing the module label
-
BatchNorm(int featAxis, int featSize, double momentum = 0.1, double eps = 1e-5, bool affine = true, bool trackStats = true)¶
Conv2D¶
-
class Conv2D : public fl::UnaryModule¶
Applies a 2D convolution over an 4D input along its first two dimensions.
This layer expects an input of shape [ \(X_{in}\), \(Y_{in}\), \(C_{in}\), \(N\)] where \(C_{in}\) is the number of input channels, and generates an output of shape [ \(X_{out}\), \(Y_{out}\), \(C_{out}\), \(N\)] where \(C_{out}\) is the number of output channels,
\[X_{out} = \frac{X_{in} + 2 \times X_{pad} - (1 + (X_{filter} - 1) \times X_{dilation})}{X_{stride}} + 1\]\[Y_{out} = \frac{Y_{in} + 2 \times Y_{pad} - (1 + (Y_{filter} - 1) \times Y_{dilation})}{Y_{stride}} + 1\]Two modes for zero-padding are supported:
AF_PADDING_NONE: no padding
AF_PADDING_SAME: \(X_{pad}\) and \(Y_{pad}\) are dynamically chosen so that
\[X_{out} = \lceil{\frac{X_{in}}{X_{stride}}}\rceil, Y_{out} = \lceil{\frac{Y_{in}}{Y_{stride}}}\rceil\]
Subclassed by fl::AsymmetricConv1D
Public Functions
-
Conv2D(int n_in, int n_out, int wx, int wy, int sx = 1, int sy = 1, detail::IntOrPadMode px = 0, detail::IntOrPadMode py = 0, int dx = 1, int dy = 1, bool bias = true, int groups = 1)¶
Constructs a Conv2D module.
- Parameters:
n_in – \(C_{in}\), the number of channels in the input
n_out – \(C_{out}\), the number of channels in the output
wx – the size of the first dimension of the convolving kernel
wy – the size of the second dimension of the convolving kernel
sx – the stride of the convolution along the first dimension
sy – the stride of the convolution along the second dimension
px – the amount of zero-padding added to the both sides of the first dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingMode
py – the amount of zero-padding added to the both sides of the second dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingMode
dx – dilation of the convolution along the first kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.
dy – dilation of the convolution along the second kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.
bias – a boolean value that controls whether to add a learnable bias to the output
groups – the number of groups that the input and output channels are divided into for restricting the connectivity between input and output channels. If
groups
> 1, the the output channels in the i-th group will be only connected to the input channels in the i-th group
-
explicit Conv2D(const Variable &w, int sx = 1, int sy = 1, detail::IntOrPadMode px = 0, detail::IntOrPadMode py = 0, int dx = 1, int dy = 1, int groups = 1)¶
Constructs a Conv2D module with a kernel
Variable
tensor.No bias term will be applied to the output.
- Parameters:
w – the kernel
Variable
tensor. The shape should be [ \(kerneldim_0\), \(kerneldim_1\), \(C_{in}\), \(C_{out}\)].sx – the stride of the convolution along the first dimension
sy – the stride of the convolution along the second dimension
px – the amount of zero-padding added to the both sides of the first dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingMode
py – the amount of zero-padding added to the both sides of the second dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingMode
dx – dilation of the convolution along the first kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.
dy – dilation of the convolution along the second kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.
groups – the number of groups that the input and output channels are divided into for restricting the connectivity between input and output channels. If
groups
> 1, the the output channels in the i-th group will be only connected to the input channels in the i-th group.
-
Conv2D(const Variable &w, const Variable &b, int sx = 1, int sy = 1, detail::IntOrPadMode px = 0, detail::IntOrPadMode py = 0, int dx = 1, int dy = 1, int groups = 1)¶
Constructs a Conv2D module with a kernel
Variable
tensor and a biasVariable
tensor.- Parameters:
w – the kernel
Variable
tensor. The shape should be [ \(kerneldim_0\), \(kerneldim_1\), \(C_{in}\), \(C_{out}\)].b – the bias
Variable
tensor. The shape should be [ \(1\), \(1\), \(C_{out}\), \(1\)].sx – the stride of the convolution along the first dimension
sy – the stride of the convolution along the second dimension
px – the amount of zero-padding added to the both sides of the first dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingMode
py – the amount of zero-padding added to the both sides of the second dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingMode
dx – dilation of the convolution along the first kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.
dy – dilation of the convolution along the second kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.
groups – the number of groups that the input and output channels are divided into for restricting the connectivity between input and output channels. If
groups
> 1, the the output channels in the i-th group will be only connected to the input channels in the i-th group.
-
Conv2D(const Conv2D &other)¶
Constructs an Conv2D module from another, performing a copy of the parameters.
- Parameters:
other – The Conv2D module to copy from.
-
Conv2D &operator=(const Conv2D &other)¶
Constructs an Conv2D module from another, performing a copy of the parameters.
- Parameters:
other – The Conv2D module to copy from.
-
virtual std::unique_ptr<Module> clone() const override¶
Clone the module via deep copy of its parameters and members.
- Returns:
A unique pointer of the cloned module.
-
virtual std::string prettyString() const override¶
Generates a stringified representation of the module.
- Returns:
a string containing the module label
Dropout¶
-
class Dropout : public fl::UnaryModule¶
Implements Dropout normalization, as given by Hinton et al (2012): Improving neural networks by preventing co-adaptation of feature detectors.
Effectively regularizes by randomly zeroing out values in the input based on a given ratio.
All values that are not zeroed out are scaled by a factor of \(\frac{1}{1 - p}\). Thus, with the same network, at test time, evaluating the module gives the identity.
Embedding¶
-
class Embedding : public fl::UnaryModule¶
Looks up embeddings from a learnable dictionary of fixed size.
This layer expects as input a list of indices with at most three dimensions, [ \(B_1\), \(B_2\) (optional), \(B_3\) (optional)], and generates an output from lookup of shape [
embeddingDim
, \(B_1\), \(B_2\) (optional), \(B_3\) (optional)].Public Functions
-
Embedding(int embeddingDim, int numEmbeddings)¶
Constructs an Embedding module.
- Parameters:
embeddingDim – the size of each embedding vector
numEmbeddings – the size of the dictionary of embeddings
-
explicit Embedding(const Variable &w)¶
Constructs an Embedding module from the weight parameter \(w\).
- Parameters:
w – the 2D
Variable
tensor for the weight \(w\). The shape should be [embeddingDim
,numEmbeddings
].
-
Embedding(const Embedding &other)¶
Constructs an Embedding module from another, performing a copy of the parameters.
- Parameters:
other – The Embedding module to copy from.
-
Embedding &operator=(const Embedding &other)¶
Constructs an Embedding module from another, performing a copy of the parameters.
- Parameters:
other – The Embedding module to copy from.
-
virtual std::unique_ptr<Module> clone() const override¶
Clone the module via deep copy of its parameters and members.
- Returns:
A unique pointer of the cloned module.
-
virtual std::string prettyString() const override¶
Generates a stringified representation of the module.
- Returns:
a string containing the module label
-
Embedding(int embeddingDim, int numEmbeddings)¶
LayerNorm¶
-
class LayerNorm : public fl::UnaryModule¶
Applies Layer Normalization on a given input as described in the paper Layer Normalization.
The operation implemented is:
\[out(x) = \frac{x - E[x]}{\sqrt{Var[x]+\epsilon}} \times \gamma + \beta \]where \(E[x]\) and \(Var[x]\) are the mean and variance of the input \(x\) calculated along specified axis, \(\epsilon\) is a small value added to the variance to avoid divide-by-zero, and \(\gamma\) and \(\beta\) are learnable parameters for affine transformation.Public Functions
-
explicit LayerNorm(int axis, double eps = 1e-5, bool affine = true, int axisSize = kLnVariableAxisSize)¶
Constructs a LayerNorm module.
- Parameters:
axis – the axis along which normalization is computed. Usually set as the feature axis.
eps – \(\epsilon\)
affine – a boolean value that controls the learning of \(\gamma\) and \(\beta\). \(\gamma\) and \(\beta\) are set to 1, 0 respectively if set to
false
, or initialized as learnable parameters if set totrue
.axisSize – total size of features specified by
axis
to perform elementwise affine transform. If the feat size is variable, usekLnVariableAxisSize
which uses singleton weight, bias and tiles them dynamically according to the given input.
-
explicit LayerNorm(const std::vector<int> &axis, double eps = 1e-5, bool affine = true, int axisSize = kLnVariableAxisSize)¶
Constructs a LayerNorm module.
- Parameters:
axis – the axis along which normalization is computed. Usually set as the feature axis.
eps – \(\epsilon\)
affine – a boolean value that controls the learning of \(\gamma\) and \(\beta\). \(\gamma\) and \(\beta\) are set to 1, 0 respectively if set to
false
, or initialized as learnable parameters if set totrue
.axisSize – total size of features specified by
axis
to perform elementwise affine transform. If the feat size is variable, usekLnVariableAxisSize
which uses singleton weight, bias and tiles them dynamically according to the given input.
-
virtual std::unique_ptr<Module> clone() const override¶
Clone the module via deep copy of its parameters and members.
- Returns:
A unique pointer of the cloned module.
-
virtual std::string prettyString() const override¶
Generates a stringified representation of the module.
- Returns:
a string containing the module label
-
explicit LayerNorm(int axis, double eps = 1e-5, bool affine = true, int axisSize = kLnVariableAxisSize)¶
Linear¶
-
class Linear : public fl::UnaryModule¶
Applies linear transformation on input: \(y = Wx + b \).
This layer takes in an input of shape [
input_size
, *, *, *] and transforms it to an output of shape [output_size
, *, *, *].Public Functions
-
Linear(int input_size, int output_size, bool bias = true)¶
Constructs a Linear module from the input and output sample sizes.
- Parameters:
input_size – the size of each input sample
output_size – the size of each output sample
bias – a boolean value that controls whether the layer will include a bias term \(b\).
-
explicit Linear(const Variable &w)¶
Constructs a Linear module from the weight parameter \(w\).
The layer will not include the bias term \(b\) in this case.
- Parameters:
w – the 2D
Variable
tensor for the weight \(w\). The shape should be [output_size
,input_size
].
-
Linear(const Variable &w, const Variable &b)¶
Constructs a Linear module from the weight parameter \(w\) and the bias parameter \(b\).
-
Linear(const Linear &other)¶
Constructs an Linear module from another, performing a deep copy of the parameters.
- Parameters:
other – The Linear module to copy from.
-
virtual std::unique_ptr<Module> clone() const override¶
Clone the module via deep copy of its parameters and members.
- Returns:
A unique pointer of the cloned module.
-
virtual std::string prettyString() const override¶
Generates a stringified representation of the module.
- Returns:
a string containing the module label
-
Linear(int input_size, int output_size, bool bias = true)¶
Padding¶
-
class Padding : public fl::UnaryModule¶
Adds a padding of value
val
before and after each dimension \(i\) of size specified by the tuplepadi
to the input.
Pool2D¶
-
class Pool2D : public fl::UnaryModule¶
A 2D pooling layer.
This layer expects an input of shape [ \(X_{in}\), \(Y_{in}\), \(C\), \(N\)]. Pooling (max or average) is performed over the first and second dimensions of the input. Thus the output will be of shape [ \(X_{out}\), \(Y_{out}\), \(C\), \(N\)].
Reorder¶
-
class Reorder : public fl::UnaryModule¶
Reorders the data according to the specified dimensions.
The order of the data may change and is guaranteed to be contiguous in memory.
// A layer which transposes a matrix auto transposeLayer = Reorder(1, 0); auto var = Variable(Tensor({1, 2, 3, 4}), false); // Make the last dimension the first dimension var = Reorder(3, 0, 1, 2)(var); // Dims will be {4, 1, 2, 3} std::cout << var.shape() << std::endl;
RNN¶
-
class RNN : public fl::Module¶
A recurrent neural network (RNN) layer.
The RNN layer supports several cell types. The most basic RNN (e.g. an Elman network) computes the following function:
\[ h_t = \sigma(W x_t + U h_{t-1} + b) \]If the RNN mode is RELU then \(\sigma\) will be aReLU
. If the RNN mode is TANH then it will be aTanh
function.Gated Recurrent Units (GRU) are supported. For details see the original GRU paper or the Wikipedia page.
LSTM cells are also supported (LSTM). The LSTM cell uses a forget gate and does not have peephole connections. For details see the original paper Long Short-Term Memory or the Wikipedia page.
The input to the RNN is expected to be of shape [ \(X_{in}\), \(N\), \(T\)] where \(N\) is the batch size and \(T\) is the sequence length.
The output of the RNN is will be of shape [ \(X_{out}\), \(N\), \(T\)]. Here \(X_{out}\) will be
hidden_size
if the RNN is unidirectional and it will be twice thehidden_size
if the RNN is bidirectional.In addition the RNN supports including the hidden state and the cell state as input and output. When these are input as the empty Variable they are assumed to be zero.
Public Functions
-
RNN(int input_size, int hidden_size, int num_layers, RnnMode mode, bool bidirectional = false, float drop_prob = 0.0)¶
Construct an RNN layer.
- Parameters:
input_size – The dimension of the input (e.g. \(X_{in}\))
hidden_size – The hidden dimension of the RNN.
num_layers – The number of recurrent layers.
mode – The RNN mode to use. Can be any of:
RELU
TANH
LSTM
GRU
bidirectional – Whether or not the RNN is bidirectional. If
true
the output dimension will be doubled.drop_prob – The probability of dropout after each RNN layer except the last layer.
-
RNN(const RNN &other)¶
Constructs an RNN module from another, performing a copy of the parameters.
- Parameters:
other – The RNN module to copy from.
-
RNN &operator=(const RNN &other)¶
Constructs an RNN module from another, performing a copy of the parameters.
- Parameters:
other – The RNN module to copy from.
-
virtual std::vector<Variable> forward(const std::vector<Variable> &inputs) override¶
Performs forward computation for the module, given some inputs.
- Parameters:
inputs – the values to compute forward computation for the module.
- Returns:
a vector of
Variable
tensors containing the result of the forward computation
-
Variable forward(const Variable &input)¶
Forward the RNN Layer.
- Parameters:
input – Should be of shape [ \(X_{in}\), \(N\), \(T\)]
- Returns:
a single output Variable with shape [ \(X_{out}\), \(N\), \(T\)]
-
std::tuple<Variable, Variable> forward(const Variable &input, const Variable &hidden_state)¶
Forward the RNN Layer.
- Parameters:
input – Should be of shape [ \(X_{in}\), \(N\), \(T\)]
hidden_state – Should be of shape [ \(X_{out}\), \(N\)]. If an empty Variable is passed in then the hidden state is assumed zero.
- Returns:
An tuple of output Variables.
-
std::tuple<Variable, Variable, Variable> forward(const Variable &input, const Variable &hidden_state, const Variable &cell_state)¶
Forward the RNN Layer.
- Parameters:
input – Should be of shape [ \(X_{in}\), \(N\), \(T\)]
hidden_state – Should be of shape [ \(X_{out}\), \(N\)]. If an empty Variable is passed in then the hidden state is assumed zero.
cell_state – Should be of shape [ \(X_{out}\), \(N\)]. If an empty Variable is passed in then the hidden state is assumed zero.
- Returns:
An tuple of output Variables.
-
virtual std::unique_ptr<Module> clone() const override¶
Clone the module via deep copy of its parameters and members.
- Returns:
A unique pointer of the cloned module.
-
virtual std::string prettyString() const override¶
Generates a stringified representation of the module.
- Returns:
a string containing the module label
-
RNN(int input_size, int hidden_size, int num_layers, RnnMode mode, bool bidirectional = false, float drop_prob = 0.0)¶
Transform¶
-
class Transform : public fl::UnaryModule¶
Applies a transformation on the input specified by a lambda function.
For example to add a \( 1 + log(x) \) layer to a container:
Note this module cannot be serialized.model.add( Transform([](const Variable& in) { return 1 + afnet::log(in); } );
View¶
-
class View : public fl::UnaryModule¶
Modifies the dimensions of a
Variable
and rearranges its elements without modifying the order of elements in the underlyingTensor
.When specifying the number of elements in the array:
If
-1
is specified on a particular axis, that axis will be assigned a dimension based on the number of total elements in the tensor. Only one axis value can be-1
.If
0
is specified on a particular axis, that axis will have the same dimension as does the input tensor. For example: given an input tensor with shape(10, 20, 30, 40)
and aView
with shape(-1, 0, 100)
, the output tensor will have shape(120, 20, 100)
.
WeightNorm¶
-
class WeightNorm : public fl::Module¶
A weight normalization layer.
This layer wraps a given module to create a weight normalized implementation of the module. WeightNorm currently supports Linear and Conv2D. For example:
WeightNorm wn(Linear(128, 128), 0);
For more details see Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
Losses¶
AdaptiveSoftMaxLoss¶
-
class AdaptiveSoftMaxLoss : public fl::BinaryModule¶
An efficient approximation of the softmax function and negative log-likelihood loss.
Computes the Adaptive Softmax, as given by Grave et al (2017): Efficient softmax approximation for GPUs. Efficient when the number of classes over which the softmax is being computed is very high and the label distribution is highly imbalanced.
Adaptive softmax buckets the inputs based on their frequency, where clusters may be different number of targets each. For each minibatch, only clusters for which at least one target is present are evaluated. Forward pass for low-frequency inputs are approximated with lower rank matrices so as to speed up computation.
Public Functions
Create an
AdaptiveSoftMaxLoss
with given parameters.- Parameters:
reduction – the reduction mode - see
ReductionMode
See documentation onReduceMode
for available options.ignoreIndex – a target value that is ignored and does not contribute to the loss or the input gradient. If
reduce
is MEAN, the loss is averaged over non-ignored targets.
-
virtual Variable forward(const Variable &inputs, const Variable &targets) override¶
Computes the categorical cross entropy loss for some input and target tensors (uses adaptive softmax function to do this efficiently)
-
virtual void setParams(const Variable &var, int position) override¶
Sets a parameter at a specified position with a new, given one.
If the specified position is not valid (it is negative or greater than
params_.size() - 1
), then an error will be thrown. A new parameter will not be created at a specified index if out of bounds.- Parameters:
var – [in] the new replacement
Variable
position – The index of the parameter which will be replaced in
params_
-
virtual std::unique_ptr<Module> clone() const override¶
Clone the module via deep copy of its parameters and members.
- Returns:
A unique pointer of the cloned module.
-
virtual std::string prettyString() const override¶
Generates a stringified representation of the module.
- Returns:
a string containing the module label
BinaryCrossEntropy¶
-
class BinaryCrossEntropy : public fl::BinaryModule¶
Computes the binary cross entropy loss between an input tensor \(x\) and a target tensor \(y\).
The binary cross entropy loss is:
\[ B(x, y) = \frac{1}{n} \sum_{i = 0}^n -\left( w_i \times (y_i \times \log(x_i) + (1 - y_i) \times \log(1 - x_i)) \right) \]where \(w\) is an optional weight parameter for rescaling.Both the inputs and the targets are expected to be between 0 and 1.
Public Functions
-
Variable forward(const Variable &inputs, const Variable &targets, const Variable &weights)¶
Perform forward loss computation with an additional weight tensor.
- Parameters:
inputs – a tensor with the predicted values
targets – a tensor with the target values
weights – a rescaling weight given to the loss of each element.
-
virtual std::unique_ptr<Module> clone() const override¶
Clone the module via deep copy of its parameters and members.
- Returns:
A unique pointer of the cloned module.
-
virtual std::string prettyString() const override¶
Generates a stringified representation of the module.
- Returns:
a string containing the module label
-
Variable forward(const Variable &inputs, const Variable &targets, const Variable &weights)¶
CategoricalCrossEntropy¶
-
class CategoricalCrossEntropy : public fl::BinaryModule¶
Computes the categorical cross entropy loss between an input and a target tensor.
The input is expected to contain log probabilities (which can be accomplished via
LogSoftmax
). The targets should contain the index of the ground truth class for each input example.In the batch case, the output loss tensor \(\{l_1,...,l_N\}^\top\), put \(l_n = -x_{n, y_n}\) (only consider the probability of the correct class). Then reduce via:
\[ \mathcal{L}(x, y) = \sum_{i = 1}^N l_i \]if using a sum reduction,\[ \mathcal{L}(x, y) = \frac{1}{n} \sum_{i = 1}^N l_i \]if using a mean reduction. If using no reduction (‘none’), the result will be reshaped to the target dimensions, giving a loss for each example. SeeReduceMode
.
MeanAbsoluteError¶
-
class MeanAbsoluteError : public fl::BinaryModule¶
Computes the mean absolute error (equivalent to the \(L_1\) loss):
\[ \mathcal{L}(x, y) = \frac{1}{n} \sum_{i = 0}^n \left| x_i - y_i \right| \]for input tensor \(x\) and target tensor \(y\) each of which contain \(n\) elements.Public Functions
-
virtual std::unique_ptr<Module> clone() const override¶
Clone the module via deep copy of its parameters and members.
- Returns:
A unique pointer of the cloned module.
-
virtual std::string prettyString() const override¶
Generates a stringified representation of the module.
- Returns:
a string containing the module label
-
virtual std::unique_ptr<Module> clone() const override¶
MeanSquaredError¶
-
class MeanSquaredError : public fl::BinaryModule¶
Computes the mean squared error between elements across two tensors:
\[ \mathcal{L}(x, y) = \frac{1}{n} \sum_{i = 0}^n \left( x_i - y_i \right)^2 \]for input tensor \(x\) and target tensor \(y\) each of which contain \(n\) elements.Public Functions
-
virtual std::unique_ptr<Module> clone() const override¶
Clone the module via deep copy of its parameters and members.
- Returns:
A unique pointer of the cloned module.
-
virtual std::string prettyString() const override¶
Generates a stringified representation of the module.
- Returns:
a string containing the module label
-
virtual std::unique_ptr<Module> clone() const override¶
Initialization¶
- group nn_init_utils
Functions for initializing tensors.
Provides facilities for creating a
fl::Variable
tensor of different types and initializations vis-a-vis probability distributions, constants, and the identity. Additionally wraps common tensors as integrated into modules.Functions
- FL_API Variable input (const Tensor &arr)
Constructs a
Variable
with gradient calculation disabled, from a given array.
- FL_API Variable noGrad (const Tensor &arr)
See
fl::input
above.
- FL_API Variable param (const Tensor &arr)
Constructs a
Variable
with gradient calculation enabled, from a given array.
- FL_API Variable constant (double val, int inputSize, int outputSize, fl::dtype type=fl::dtype::f32, bool calcGrad=true)
Creates a
Variable
representing a tensor with dimensions[inputSize, outputSize]
where all elements are a constant.- Parameters:
val – the value of the constant in the tensor
inputSize – the second dimension for the output tensor shape
outputSize – the first dimension of the output tensor shape
type – the datatype for which to create the tensor
calcGrad – flag denoting whether gradient calculation on the resulting
Variable
should be enabled
- Returns:
A
Variable
containing a tensor with constant values.
- FL_API Variable constant (double val, const Shape &shape, fl::dtype type=fl::dtype::f32, bool calcGrad=true)
Creates a
Variable
representing a tensor of up to rank 4 with arbitrary dimensions where all elements are a constant.
-
template<typename T>
Variable scalar(T val, fl::dtype type = dtype_traits<T>::ctype, bool calcGrad = true)¶ Creates a
Variable
representing a scalar with a given value and type.
- FL_API Variable identity (int inputSize, int outputSize, fl::dtype type=fl::dtype::f32, bool calcGrad=true)
Creates a
Variable
representing an identity tensor with dimensions[inputSize, outputSize]
.- Parameters:
inputSize – the second dimension for the output tensor shape
outputSize – the first dimension of the output tensor shape
type – the datatype for which to create the tensor
calcGrad – flag denoting whether gradient calculation on the resulting
Variable
should be enabled
- Returns:
A
Variable
containing the identity tensor.
- FL_API Variable identity (const Shape &shape, fl::dtype type=fl::dtype::f32, bool calcGrad=true)
Creates a
Variable
representing an identity tensor of up to rank 4 with arbitrary dimensions.
- FL_API Variable uniform (int inputSize, int outputSize, double min=0, double max=1, fl::dtype type=fl::dtype::f32, bool calcGrad=true)
Creates a
Variable
representing a tensor with dimensions[inputSize, outputSize]
, where elements are distributed according to a uniform distribution with parameters \(\mathcal{U}(min, max)\).See Uniform Distribution.
- Parameters:
inputSize – the second dimension for the output tensor shape
outputSize – the first dimension of the output tensor shape
min – the lower bound parameter for the uniform distribution
max – the upper bound parameter for the uniform distribution
type – the datatype for which to create the tensor
calcGrad – flag denoting whether gradient calculation on the resulting
Variable
should be enabled
- Returns:
A
Variable
containing a tensor with random values distributed accordingly.
- FL_API Variable uniform (const Shape &shape, double min=0, double max=1, fl::dtype type=fl::dtype::f32, bool calcGrad=true)
Creates a
Variable
representing a tensor of up to rank 4 with arbitrary dimensions, where elements are distributed according to a uniform distribution with parameters \(\mathcal{U}(min, max)\).See Uniform Distribution.
- Parameters:
- Returns:
A
Variable
containing a tensor with random values distributed accordingly.
- FL_API Variable normal (int inputSize, int outputSize, double stdv=1, double mean=0, fl::dtype type=fl::dtype::f32, bool calcGrad=true)
Creates a
Variable
representing a tensor with dimensions[inputSize, outputSize]
where elements are distributed according to a normal distribution with parameters \(\mathcal{N}(\mu, \sigma^2)\).See Normal Distribution.
- Parameters:
inputSize – the second dimension for the output tensor shape
outputSize – the first dimension of the output tensor shape
stdv – the standard deviation by which to parameterize the distribution
mean – the mean by which to parameterize the distribution
type – the datatype for which to create the tensor
calcGrad – flag denoting whether gradient calculation on the resulting
Variable
should be enabled
- Returns:
A
Variable
containing a tensor with random values distributed accordingly.
- FL_API Variable normal (const Shape &shape, double stdv=1, double mean=0, fl::dtype type=fl::dtype::f32, bool calcGrad=true)
Creates a
Variable
representing a tensor of up to rank 4 with arbitrary dimensions, where elements are distributed according to a normal distribution with parameters \(\mathcal{N}(\mu, \sigma^2)\).See Normal Distribution.
- Parameters:
shape – a Tensor shape
stdv – the standard deviation by which to parameterize the distribution
mean – the mean by which to parameterize the distribution
type – the datatype for which to create the tensor
calcGrad – flag denoting whether gradient calculation on the resulting
Variable
should be enabled
- Returns:
A
Variable
containing a tensor with random values distributed accordingly.
- FL_API Variable kaimingUniform (const Shape &shape, int fanIn, fl::dtype type=fl::dtype::f32, bool calcGrad=true)
Creates a
Variable
representing a tensor with given input dimensions where elements are uniformly distributed according to the method outlined in He et al (2015): Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.- Parameters:
- Returns:
A
Variable
containing a tensor with random values distributed accordingly.
- FL_API Variable kaimingNormal (const Shape &shape, int fanIn, fl::dtype type=fl::dtype::f32, bool calcGrad=true)
Creates a
Variable
representing a tensor with given input dimensions where elements are normally distributed according to the method outlined in He et al (2015): Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.- Parameters:
- Returns:
A
Variable
containing a tensor with random values distributed accordingly.
- FL_API Variable glorotUniform (const Shape &shape, int fanIn, int fanOut, fl::dtype type=fl::dtype::f32, bool calcGrad=true)
Creates a
Variable
representing a tensor with given input dimensions where elements are uniformly distributed according to the method outlined in Glorot and Bengio (2010): Understanding the difficulty of training deep feedforward neural networks.- Parameters:
- Returns:
A
Variable
containing a tensor with random values distributed accordingly.
- FL_API Variable glorotNormal (const Shape &shape, int fanIn, int fanOut, fl::dtype type=fl::dtype::f32, bool calcGrad=true)
Creates a
Variable
representing a tensor with given input dimensions where elements are normally distributed according to the method outlined in Glorot and Bengio (2010): Understanding the difficulty of training deep feedforward neural networks.- Parameters:
- Returns:
A
Variable
containing a tensor with random values distributed accordingly.
- FL_API Variable truncNormal (const Shape &shape, double stdv=1., double mean=0., double minCufOff=-2., double maxCutOff=2., fl::dtype type=fl::dtype::f32, bool calcGrad=true)
Creates a
Variable
representing a tensor with given input dimensions where elements are distributed according to the a truncated normal distribution as in here.- Parameters:
shape – an ArrayFire tensor shape
stdv – the standard deviation by which to parameterize the distribution
mean – the mean by which to parameterize the distribution
minCufOff – the minimum value of the distribution
maxCutOff – the maximum value of the distribution
type – the datatype for which to create the tensor
calcGrad – flag denoting whether gradient calculation on the resulting
Variable
should be enabled
- Returns:
A
Variable
containing a tensor with random values distributed accordingly.
Utils¶
- FL_API int64_t numTotalParams (std::shared_ptr< fl::Module > module)
Compute the total number of parameters of a fl::Module.
- Parameters:
module – [in] The module over which to compute params
- Returns:
the number of parameters in the module
- FL_API bool allParamsClose (const Module &a, const Module &b, double absTolerance=1e-5)
Returns true if the parameters of two modules are of same type and are element-wise equal within a given tolerance limit.
- Parameters:
[a, b] – input Modules to compare
absTolerance – absolute tolerance allowed
- FL_API int derivePadding (int inSz, int filterSz, int stride, int pad, int dilation)
- FL_API Tensor join (const std::vector< Tensor > &inputs, double padValue=0.0, int batchDim=-1)
packs a list of arrays (possibly of different dimensions) to a single array by padding them to same dimensions
DistributedUtils¶
- FL_API void distributeModuleGrads (std::shared_ptr< const Module > module, std::shared_ptr< Reducer > reducer=std::make_shared< InlineReducer >(1.0/getWorldSize()))
Registers a module for allreduce synchronization with a gradient hook on it parameter Variables.
- Parameters:
module – [in] a module whose parameter gradients will be synchronized
a – [in]
Reducer
instance to which gradients will be immediately added when available
- FL_API void allReduceParameters (std::shared_ptr< const Module > module)
Traverses the network and averages its parameters with allreduce.
- Parameters:
module – a module whose parameters will be synchronized
- FL_API void allReduceGradients (std::shared_ptr< const Module > module, double scale=1.0)
Traverses the network and synchronizes the gradients of its parameters with allreduce.
- Parameters:
module – a module whose parameter gradients will be synchronized
scale – scale gradients after allreduce by this factor