Modules¶
Containers¶
Module¶

class
Module
¶ An abstract computation unit capable of forward computation.
Also contains a collection of parameters that can be mutated, and will be serialized and deserialized with the module.
Subclassed by fl::BinaryModule, fl::Container, fl::RNN, fl::UnaryModule, fl::WeightNorm
Public Functions

std::vector<Variable>
params
() const¶ Gets the parameters of the module.
 Return
the modules parameters as a vector of
Variable

virtual void
train
()¶ Switches the module to training mode.
Changes all parameters so that gradient calculation will be enabled for any calls to
forward
.

virtual void
eval
()¶ Switches the module to evaluation mode.
Changes all parameters so that gradient calculation will be disabled for any calls to
forward
.

Variable
param
(int position) const¶ Returns a module parameter given a particular position.
 Return
a
Variable
tensor for the parameter at the requested position Parameters
position
: the index of the requested parameter inparams_

virtual void
setParams
(const Variable &var, int position)¶ Sets a parameter at a specified position with a new, given one.
If the specified position is not valid (it is negative or greater than
params_.size()  1
), then an error will be thrown. A new parameter will not be created at a specified index if out of bounds. Parameters
var
: the new replacementVariable
position
: The index of the parameter which will be replaced inparams_

void
zeroGrad
()¶ Clears references to gradient Variables for all parameters in the module.

virtual std::vector<Variable>
forward
(const std::vector<Variable> &inputs) = 0¶ Performs forward computation for the module, given some inputs.
 Return
a vector of
Variable
tensors containing the result of the forward computation Parameters
inputs
: the values to compute forward computation for the module.

std::vector<Variable>
operator()
(const std::vector<Variable> &inputs)¶ Overload for forward computation for the module.
 Return
a vector of
Variable
tensors containing the result of the forward computation Parameters
inputs
: the values to compute forward computation for the module.

virtual std::string
prettyString
() const = 0¶ Generates a stringified representation of the module.
 Return
a string containing the module label

virtual
~Module
()¶

std::vector<Variable>

class
UnaryModule
: public fl::Module¶ An extension of
Module
which supports only forward computation on a singleVariable
with a singleVariable
as output.For example,
Sigmoid
module can be derived fromUnaryModule
.Subclassed by fl::AdaptiveSoftMax, fl::BatchNorm, fl::Conv2D, fl::Dropout, fl::ELU, fl::Embedding, fl::GatedLinearUnit, fl::HardTanh, fl::LayerNorm, fl::LeakyReLU, fl::Linear, fl::Log, fl::LogSoftmax, fl::Padding, fl::Pool2D, fl::PReLU, fl::ReLU, fl::ReLU6, fl::Reorder, fl::Sigmoid, fl::Tanh, fl::ThresholdReLU, fl::Transform, fl::View
Public Functions

UnaryModule
()¶

std::vector<Variable>
forward
(const std::vector<Variable> &inputs)¶ Performs forward computation for the module, given some inputs.
 Return
a vector of
Variable
tensors containing the result of the forward computation Parameters
inputs
: the values to compute forward computation for the module.

virtual
~UnaryModule
()¶


class
BinaryModule
: public fl::Module¶ An extension of
Module
which supports only forward computation on a pair ofVariable
s with a singleVariable
as output.For example,
BinaryCrossEntropy
Loss can be derived fromBinaryModule
.Subclassed by fl::AdaptiveSoftMaxLoss, fl::BinaryCrossEntropy, fl::CategoricalCrossEntropy, fl::MeanAbsoluteError, fl::MeanSquaredError
Public Functions

BinaryModule
()¶

std::vector<Variable>
forward
(const std::vector<Variable> &inputs)¶ Performs forward computation for the module, given some inputs.
 Return
a vector of
Variable
tensors containing the result of the forward computation Parameters
inputs
: the values to compute forward computation for the module.

virtual
~BinaryModule
()¶

Container¶

class
Container
: public fl::Module¶ A computation unit capable of forward computation that contains a collection of multiple
Module
and their respective parameters.Subclassed by fl::PositionEmbedding, fl::Residual, fl::Sequential, fl::Transformer
Public Functions

template<typename
T
>
voidadd
(const T &module)¶ Adds a module to a
Container
by making a copy of the underlying module.Note that parameters are still shared, due to Variable’s copy semantics.
 Parameters
module
: the module to add.
Adds a module to
modules_
, and adds parameters to the container’sparams_
. Parameters
module
: the module to add.

ModulePtr
module
(int id) const¶ Returns a pointer to the module at the specified index in the container’s
modules_
. Return
a pointer to the requested module
 Parameters
id
: the index of the module to return

std::vector<ModulePtr>
modules
() const¶ Returns pointers to each of
Module
in theContainer
. Return
an ordered vector of pointers for each module.

void
setParams
(const Variable &var, int position)¶ Sets a parameter at a specified position with a new, given one.
If the specified position is not valid (it is negative or greater than
params_.size()  1
), then an error will be thrown. A new parameter will not be created at a specified index if out of bounds. Parameters
var
: the new replacementVariable
position
: The index of the parameter which will be replaced inparams_

template<typename
Sequential¶

class
Sequential
: public fl::Container¶ A
Container
representing an ordered sequence of modules, which is capable of forward computation through each of its modules, in order.Usage:
Sequential mySequential(); // Assume we've defined implemented three modules, mod1, mod2, mod3 mySequential.add(mod1); mySequential.add(mod2); mySequential.add(mod3); // Performing forward computation will forward through each `Module` in order auto result = mySequential.forward(myInput); // We can also inspect internal state assert(mySequential.modules().size() == 3); // true assert( mod1.params().size() + mod2.params.size() + mod3.params().size() == mySequential.params().size()); // true
Public Functions

Sequential
()¶

std::vector<Variable>
forward
(const std::vector<Variable> &input)¶ Performs forward computation for the
Sequential
, callingforward
, in order, for eachModule
, and feeding the result as input to the nextModule
.

std::string
prettyString
() const¶ Generates a stringified representation of the
Sequential
by concatenating string representations for each containedModule
 Return
a string containing the module label

Layers¶
Activations¶

class
Sigmoid
: public fl::UnaryModule¶ Applies the sigmoid function elementwise to a
Variable
:\[\text{sigmoid}(x) = \frac{1}{1 + e^{x}}\].Public Functions

std::string
prettyString
() const¶ Generates a stringified representation of the module.
 Return
a string containing the module label

std::string

class
Tanh
: public fl::UnaryModule¶ Applies the hyperbolic tangent function elementwise to a
Variable
:\[\text{tanh}(x) = \frac{e^x  e^{x}}{e^x + e^{x}}\].Public Functions

std::string
prettyString
() const¶ Generates a stringified representation of the module.
 Return
a string containing the module label

std::string

class
HardTanh
: public fl::UnaryModule¶ Applies the hardtanh function elementwise to a
Variable
:\[\begin{split}\text{HardTanh}(x) = \begin{cases} 1 & \text{ if } x > 1 \\ 1 & \text{ if } x < 1 \\ x & \text{ otherwise } \\ \end{cases} \end{split}\].Public Functions

std::string
prettyString
() const¶ Generates a stringified representation of the module.
 Return
a string containing the module label

std::string

class
ReLU
: public fl::UnaryModule¶ Applies the rectified linear unit function elementwise to a
Variable
:\[ ReLU(x) = \max(0, x) \].Public Functions

std::string
prettyString
() const¶ Generates a stringified representation of the module.
 Return
a string containing the module label

std::string

class
LeakyReLU
: public fl::UnaryModule¶ Applies the leaky rectified linear unit function from Maas et al (2013), Rectifier Nonlinearities Improve Neural Network Acoustic Models.
Applied function elementwise to a
Variable
:\[\begin{split} \text{LeakyRELU}(x) = \begin{cases} x, & \text{ if } x \geq 0 \\ \text{slope} \times x, & \text{ otherwise } \end{cases} \end{split}\]where \(\text{slope}\) is a constant by which the input will be multiplied if less than zero.

class
PReLU
: public fl::UnaryModule¶ Applies the pramaeterized rectified linear unit function from He et al (2015), Delving Deep into Rectifiers: Surpassing HumanLevel Performance on ImageNet Classification.
Applied elementwise to a
Variable
, given some input size:\[\begin{split} \text{PReLU}(x) = \begin{cases} x, & \text{ if } x \geq 0 \\ \text{value} \times x, & \text{ otherwise } \end{cases} \end{split}\]where \(\text{value}\) is a learned parameter whose initialization can be tuned.

class
ELU
: public fl::UnaryModule¶ Applies the exponential linear unit function from Clevert et al (2015): Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs).
Applied elementwise to a
Variable
:\[\begin{split} \text{ELU}(x) = \begin{cases} x & \text{ if } x \geq 0 \\ \alpha \times (e^x  1) & \text{ otherwise } \end{cases} \end{split}\]where \(\alpha\) is a tunable parameter.

class
ThresholdReLU
: public fl::UnaryModule¶ Applies the threshold rectified linear unit from Konda et al (2015): Zerobias autoencoders and the benefits of coadapting features.
Applied elementwise to a
Variable
:\[\begin{split} \text{ThresholdReLU}(x) = \begin{cases} x & \text{ if } x > \text{threshold} \\ 0 & \text{ otherwise } \end{cases} \end{split}\]where \(\text{threshold}\) is a tunable parameter.

class
GatedLinearUnit
: public fl::UnaryModule¶ Creates a Gated Linear Unit from Dauphin et al (2017): Language Modeling with Gated Convolutional Networks.
\[\text{GLU}(x) = x_i \otimes \sigma(x_j)\]where \(\otimes\) denotes the elementwise product \(x_i\) is the first half of the input, \(x_j\) is the second half, and \(\sigma(x)\) is the sigmoid function.

class
LogSoftmax
: public fl::UnaryModule¶ Applies the log softmax function to a tensor:
\[ \text{LogSoftmax}(x_i) = \log{\left (\frac{e^{x_i} }{ \sum_j e^{x_j}} \right)} \].

class
Log
: public fl::UnaryModule¶ Applies the natural logarithm elementwise to a
Variable
.Public Functions

std::string
prettyString
() const¶ Generates a stringified representation of the module.
 Return
a string containing the module label

std::string
BatchNorm¶

class
BatchNorm
: public fl::UnaryModule¶ Applies Batch Normalization on a given input as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.
The operation implemented is:
\[out(x) = \frac{x  E[x]}{\sqrt{Var[x]+\epsilon}} \times \gamma + \beta \]where \(E[x]\) and \(Var[x]\) are the mean and variance of the input \(x\) calculated over the specified axis, \(\epsilon\) is a small value added to the variance to avoid dividebyzero, and \(\gamma\) and \(\beta\) are learnable parameters for affine transformation.Public Functions

BatchNorm
(int featAxis, int featSize, double momentum = 0.1, double eps = 1e5, bool affine = true, bool trackStats = true)¶ Constructs a BatchNorm module.
 Parameters
featAxis
: the axis over which normalizationis performedfeatSize
: the size of the dimension alongfeatAxis
momentum
: an exponential average factor used to compute running mean and variance.\[ runningMean = runningMean \times (1momentum) + newMean \times momentum \]If < 0, cumulative moving average is used.eps
: \(\epsilon\)affine
: a boolean value that controls the learning of \(\gamma\) and \(\beta\). \(\gamma\) and \(\beta\) are set to 1, 0 respectively if set tofalse
, or initialized as learnable parameters if set totrue
.trackStats
: a boolean value that controls whether to track the running mean and variance while in train mode. Iffalse
, batch statistics are used to perform normalization in both train and eval mode.

BatchNorm
(const std::vector<int> &featAxis, int featSize, double momentum = 0.1, double eps = 1e5, bool affine = true, bool trackStats = true)¶ Constructs a BatchNorm module.
 Parameters
featAxis
: the axis over which normalization is performedfeatSize
: total dimension alongfeatAxis
. For example, to perform Temporal Batch Normalization on input of size [ \(L\), \(C\), \(N\)], usefeatAxis
= {1},featSize
= \(C\). To perform normalization per activation on input of size [ \(W\), \(H\), \(C\), \(N\)], usefeatAxis
= {0, 1, 2},featSize
= \(W \times H \times C\).momentum
: an exponential average factor used to compute running mean and variance.\[ runningMean = runningMean \times (1momentum) + newMean \times momentum \]If < 0, cumulative moving average is used.eps
: \(\epsilon\)affine
: a boolean value that controls the learning of \(\gamma\) and \(\beta\). \(\gamma\) and \(\beta\) are set to 1, 0 respectively if set tofalse
, or initialized as learnable parameters if set totrue
.trackStats
: a boolean value that controls whether to track the running mean and variance while in train mode. Iffalse
, batch statistics are used to perform normalization in both train and eval mode.

std::string
prettyString
() const¶ Generates a stringified representation of the module.
 Return
a string containing the module label

Conv2D¶

class
Conv2D
: public fl::UnaryModule¶ Applies a 2D convolution over an 4D input along its first two dimensions.
This layer expects an input of shape [ \(X_{in}\), \(Y_{in}\), \(C_{in}\), \(N\)] where \(C_{in}\) is the number of input channels, and generates an output of shape [ \(X_{out}\), \(Y_{out}\), \(C_{out}\), \(N\)] where \(C_{out}\) is the number of output channels,
\[X_{out} = \frac{X_{in} + 2 \times X_{pad}  (1 + (X_{filter}  1) \times X_{dilation})}{X_{stride}} + 1\]\[Y_{out} = \frac{Y_{in} + 2 \times Y_{pad}  (1 + (Y_{filter}  1) \times Y_{dilation})}{Y_{stride}} + 1\]Two modes for zeropadding are supported:
AF_PADDING_NONE: no padding
AF_PADDING_SAME: \(X_{pad}\) and \(Y_{pad}\) are dynamically chosen so that
\[X_{out} = \lceil{\frac{X_{in}}{X_{stride}}}\rceil, Y_{out} = \lceil{\frac{Y_{in}}{Y_{stride}}}\rceil\]
Subclassed by fl::AsymmetricConv1D
Public Functions

Conv2D
(int n_in, int n_out, int wx, int wy, int sx = 1, int sy = 1, detail::IntOrPadMode px = 0, detail::IntOrPadMode py = 0, int dx = 1, int dy = 1, bool bias = true, int groups = 1)¶ Constructs a Conv2D module.
 Parameters
n_in
: \(C_{in}\), the number of channels in the inputn_out
: \(C_{out}\), the number of channels in the outputwx
: the size of the first dimension of the convolving kernelwy
: the size of the second dimension of the convolving kernelsx
: the stride of the convolution along the first dimensionsy
: the stride of the convolution along the second dimensionpx
: the amount of zeropadding added to the both sides of the first dimension of the input. Accepts a nonnegative integer value or an enum fl::PaddingModepy
: the amount of zeropadding added to the both sides of the second dimension of the input. Accepts a nonnegative integer value or an enum fl::PaddingModedx
: dilation of the convolution along the first kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.dy
: dilation of the convolution along the second kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.bias
: a boolean value that controls whether to add a learnable bias to the outputgroups
: the number of groups that the input and output channels are divided into for restricting the connectivity between input and output channels. Ifgroups
> 1, the the output channels in the ith group will be only connected to the input channels in the ith group

Conv2D
(const Variable &w, int sx = 1, int sy = 1, detail::IntOrPadMode px = 0, detail::IntOrPadMode py = 0, int dx = 1, int dy = 1, int groups = 1)¶ Constructs a Conv2D module with a kernel
Variable
tensor.No bias term will be applied to the output.
 Parameters
w
: the kernelVariable
tensor. The shape should be [ \(kerneldim_0\), \(kerneldim_1\), \(C_{in}\), \(C_{out}\)].sx
: the stride of the convolution along the first dimensionsy
: the stride of the convolution along the second dimensionpx
: the amount of zeropadding added to the both sides of the first dimension of the input. Accepts a nonnegative integer value or an enum fl::PaddingModepy
: the amount of zeropadding added to the both sides of the second dimension of the input. Accepts a nonnegative integer value or an enum fl::PaddingModedx
: dilation of the convolution along the first kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.dy
: dilation of the convolution along the second kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.groups
: the number of groups that the input and output channels are divided into for restricting the connectivity between input and output channels. Ifgroups
> 1, the the output channels in the ith group will be only connected to the input channels in the ith group.

Conv2D
(const Variable &w, const Variable &b, int sx = 1, int sy = 1, detail::IntOrPadMode px = 0, detail::IntOrPadMode py = 0, int dx = 1, int dy = 1, int groups = 1)¶ Constructs a Conv2D module with a kernel
Variable
tensor and a biasVariable
tensor. Parameters
w
: the kernelVariable
tensor. The shape should be [ \(kerneldim_0\), \(kerneldim_1\), \(C_{in}\), \(C_{out}\)].b
: the biasVariable
tensor. The shape should be [ \(1\), \(1\), \(C_{out}\), \(1\)].sx
: the stride of the convolution along the first dimensionsy
: the stride of the convolution along the second dimensionpx
: the amount of zeropadding added to the both sides of the first dimension of the input. Accepts a nonnegative integer value or an enum fl::PaddingModepy
: the amount of zeropadding added to the both sides of the second dimension of the input. Accepts a nonnegative integer value or an enum fl::PaddingModedx
: dilation of the convolution along the first kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.dy
: dilation of the convolution along the second kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.groups
: the number of groups that the input and output channels are divided into for restricting the connectivity between input and output channels. Ifgroups
> 1, the the output channels in the ith group will be only connected to the input channels in the ith group.

std::string
prettyString
() const¶ Generates a stringified representation of the module.
 Return
a string containing the module label
Dropout¶

class
Dropout
: public fl::UnaryModule¶ Implements Dropout normalization, as given by Hinton et al (2012): Improving neural networks by preventing coadaptation of feature detectors.
Effectively regularizes by randomly zeroing out values in the input based on a given ratio.
All values that are not zeroed out are scaled by a factor of \(\frac{1}{1  p}\). Thus, with the same network, at test time, evaluating the module gives the identity.
Embedding¶

class
Embedding
: public fl::UnaryModule¶ Looks up embeddings from a learnable dictionary of fixed size.
This layer expects as input a list of indices with at most three dimensions, [ \(B_1\), \(B_2\) (optional), \(B_3\) (optional)], and generates an output from lookup of shape [
embedding_dim
, \(B_1\), \(B_2\) (optional), \(B_3\) (optional)].Public Functions

Embedding
(int embedding_dim, int num_embeddings)¶ Constructs an Embedding module.
 Parameters
embedding_dim
: the size of each embedding vectornum_embeddings
: the size of the dictionary of embeddings

std::string
prettyString
() const¶ Generates a stringified representation of the module.
 Return
a string containing the module label

LayerNorm¶

class
LayerNorm
: public fl::UnaryModule¶ Applies Layer Normalization on a given input as described in the paper Layer Normalization.
The operation implemented is:
\[out(x) = \frac{x  E[x]}{\sqrt{Var[x]+\epsilon}} \times \gamma + \beta \]where \(E[x]\) and \(Var[x]\) are the mean and variance of the input \(x\) calculated along specified axis, \(\epsilon\) is a small value added to the variance to avoid dividebyzero, and \(\gamma\) and \(\beta\) are learnable parameters for affine transformation.Public Functions

LayerNorm
(int axis, double eps = 1e5, bool affine = true, int axisSize = kLnVariableAxisSize)¶ Constructs a LayerNorm module.
 Parameters
axis
: the axis along which normalization is computed. Usually set as the feature axis.eps
: \(\epsilon\)affine
: a boolean value that controls the learning of \(\gamma\) and \(\beta\). \(\gamma\) and \(\beta\) are set to 1, 0 respectively if set tofalse
, or initialized as learnable parameters if set totrue
.axisSize
: size of features specified byaxis
to perform elementwise affine transform. If the feat size is variable, usekLnVariableAxisSize
which uses singleton weight, bias and tiles them dynamically according to the given input.

LayerNorm
(const std::vector<int> &axis, double eps = 1e5, bool affine = true, int axisSize = kLnVariableAxisSize)¶ Constructs a LayerNorm module.
 Parameters
axis
: the axis along which normalization is computed. Usually set as the feature axis.eps
: \(\epsilon\)affine
: a boolean value that controls the learning of \(\gamma\) and \(\beta\). \(\gamma\) and \(\beta\) are set to 1, 0 respectively if set tofalse
, or initialized as learnable parameters if set totrue
.featSize
: size of features specified byaxis
to perform elementwise affine transform. If the feat size is variable, usekLnVariableAxisSize
which uses singleton weight, bias and tiles them dynamically according to the given input.

std::string
prettyString
() const¶ Generates a stringified representation of the module.
 Return
a string containing the module label

Linear¶

class
Linear
: public fl::UnaryModule¶ Applies linear transformation on input: \(y = Wx + b \).
This layer takes in an input of shape [
input_size
, *, *, *] and transforms it to an output of shape [output_size
, *, *, *].Public Functions

Linear
(int input_size, int output_size, bool bias = true)¶ Constructs a Linear module from the input and output sample sizes.
 Parameters
input_size
: the size of each input sampleoutput_size
: the size of each output samplebias
: a boolean value that controls whether the layer will include a bias term \(b\).

Linear
(const Variable &w)¶ Constructs a Linear module from the weight parameter \(w\).
The layer will not include the bias term \(b\) in this case.
 Parameters
w
: the 2DVariable
tensor for the weight \(w\). The shape should be [output_size
,input_size
].

Linear
(const Variable &w, const Variable &b)¶ Constructs a Linear module from the weight parameter \(w\) and the bias parameter \(b\).

std::string
prettyString
() const¶ Generates a stringified representation of the module.
 Return
a string containing the module label

Padding¶

class
Padding
: public fl::UnaryModule¶ Adds a padding of value
val
before and after each dimension \(i\) of size specified by the tuplepadi
to the input.
Pool2D¶

class
Pool2D
: public fl::UnaryModule¶ A 2D pooling layer.
This layer expects an input of shape [ \(X_{in}\), \(Y_{in}\), \(C\), \(N\)]. Pooling (max or average) is performed over the first and second dimensions of the input. Thus the output will be of shape [ \(X_{out}\), \(Y_{out}\), \(C\), \(N\)].
Reorder¶

class
Reorder
: public fl::UnaryModule¶ Reorders the data according to the specified dimensions.
The order of the data may change and is guaranteed to be contiguous in memory.
// A layer which transposes a matrix auto transposeLayer = Reorder(1, 0); auto var = Variable(af::array(1, 2, 3, 4), false); // Make the last dimension the first dimension var = Reorder(3, 0, 1, 2)(var); // Dims will be {4, 1, 2, 3} std::cout << var.dims() << std::endl;
RNN¶

class
RNN
: public fl::Module¶ A recurrent neural network (RNN) layer.
The RNN layer supports several cell types. The most basic RNN (e.g. an Elman network) computes the following function:
\[ h_t = \sigma(W x_t + U h_{t1} + b) \]If the RNN mode is RELU then \(\sigma\) will be aReLU
. If the RNN mode is TANH then it will be aTanh
function.Gated Recurrent Units (GRU) are supported. For details see the original GRU paper or the Wikipedia page.
LSTM cells are also supported (LSTM). The LSTM cell uses a forget gate and does not have peephole connections. For details see the original paper Long ShortTerm Memory or the Wikipedia page.
The input to the RNN is expected to be of shape [ \(X_{in}\), \(N\), \(T\)] where \(N\) is the batch size and \(T\) is the sequence length.
The output of the RNN is will be of shape [ \(X_{out}\), \(N\), \(T\)]. Here \(X_{out}\) will be
hidden_size
if the RNN is unidirectional and it will be twice thehidden_size
if the RNN is bidirectional.In addition the RNN supports including the hidden state and the cell state as input and output. When these are input as the empty Variable they are assumed to be zero.
Public Functions

RNN
(int input_size, int hidden_size, int num_layers, RnnMode mode, bool bidirectional = false, float drop_prob = 0.0)¶ Construct an RNN layer.
 Parameters
input_size
: The dimension of the input (e.g. \(X_{in}\))hidden_size
: The hidden dimension of the RNN.num_layers
: The number of recurrent layers.mode
: The RNN mode to use. Can be any of:RELU
TANH
LSTM
GRU
bidirectional
: Whether or not the RNN is bidirectional. Iftrue
the output dimension will be doubled.drop_prob
: The probability of dropout after each RNN layer except the last layer.

std::vector<Variable>
forward
(const std::vector<Variable> &inputs)¶ Performs forward computation for the module, given some inputs.
 Return
a vector of
Variable
tensors containing the result of the forward computation Parameters
inputs
: the values to compute forward computation for the module.

Variable
forward
(const Variable &input)¶ Forward the RNN Layer.
 Return
a single output Variable with shape [ \(X_{out}\), \(N\), \(T\)]
 Parameters
input
: Should be of shape [ \(X_{in}\), \(N\), \(T\)]

std::tuple<Variable, Variable>
forward
(const Variable &input, const Variable &hidden_state)¶ Forward the RNN Layer.
 Return
An tuple of output Variables.
 Parameters
input
: Should be of shape [ \(X_{in}\), \(N\), \(T\)]hidden_state
: Should be of shape [ \(X_{out}\), \(N\)]. If an empty Variable is passed in then the hidden state is assumed zero.

std::tuple<Variable, Variable, Variable>
forward
(const Variable &input, const Variable &hidden_state, const Variable &cell_state)¶ Forward the RNN Layer.
 Return
An tuple of output Variables.
 Parameters
input
: Should be of shape [ \(X_{in}\), \(N\), \(T\)]hidden_state
: Should be of shape [ \(X_{out}\), \(N\)]. If an empty Variable is passed in then the hidden state is assumed zero.cell_state
: Should be of shape [ \(X_{out}\), \(N\)]. If an empty Variable is passed in then the hidden state is assumed zero.

std::string
prettyString
() const¶ Generates a stringified representation of the module.
 Return
a string containing the module label

Transform¶

class
Transform
: public fl::UnaryModule¶ Applies a transformation on the input specified by a lambda function.
For example to add a \( 1 + log(x) \) layer to a container:
Note this module cannot be serialized.model.add( Transform([](const Variable& in) { return 1 + afnet::log(in); } );
View¶

class
View
: public fl::UnaryModule¶ Modifies the dimensions of a
Variable
and rearranges its elements without modifying the order of elements in the underlyingaf::array
.When specifying the number of elements in the array:
If
1
is specified on a particular axis, that axis will be assigned a dimension based on the number of total elements in the tensor. Only one axis value can be1
.If
0
is specified on a particular axis, that axis will have the same dimension as does the input tensor. For example: given an input tensor with shape(10, 20, 30, 40)
and aView
with shape(1, 0, 100)
, the output tensor will have shape(120, 20, 100)
.
WeightNorm¶

class
WeightNorm
: public fl::Module¶ A weight normalization layer.
This layer wraps a given module to create a weight normalized implementation of the module. WeightNorm currently supports Linear and Conv2D. For example:
WeightNorm wn(Linear(128, 128), 0);
For more details see Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
Losses¶
AdaptiveSoftMaxLoss¶

class
AdaptiveSoftMaxLoss
: public fl::BinaryModule¶ An efficient approximation of the softmax function and negative loglikelihood loss.
Computes the Adaptive Softmax, as given by Grave et al (2017): Efficient softmax approximation for GPUs. Efficient when the number of classes over which the softmax is being computed is very high and the label distribution is highly imbalanced.
Adaptive softmax buckets the inputs based on their frequency, where clusters may be different number of targets each. For each minibatch, only clusters for which at least one target is present are evaluated. Forward pass for lowfrequency inputs are approximated with lower rank matrices so as to speed up computation.
Public Functions
Create an
AdaptiveSoftMaxLoss
with given parameters. Parameters
reduction
: the reduction mode  seeReductionMode
See documentation onReduceMode
for available options.

Variable
forward
(const Variable &inputs, const Variable &targets)¶ Computes the categorical cross entropy loss for some input and target tensors (uses adaptive softmax function to do this efficiently)

void
setParams
(const Variable &var, int position)¶ Sets a parameter at a specified position with a new, given one.
If the specified position is not valid (it is negative or greater than
params_.size()  1
), then an error will be thrown. A new parameter will not be created at a specified index if out of bounds. Parameters
var
: the new replacementVariable
position
: The index of the parameter which will be replaced inparams_

std::string
prettyString
() const¶ Generates a stringified representation of the module.
 Return
a string containing the module label
BinaryCrossEntropy¶

class
BinaryCrossEntropy
: public fl::BinaryModule¶ Computes the binary cross entropy loss between an input tensor \(x\) and a target tensor \(y\).
The binary cross entropy loss is:
\[ B(x, y) = \frac{1}{n} \sum_{i = 0}^n \left( w_i \times (y_i \times \log(x_i) + (1  y_i) \times \log(1  x_i)) \right) \]where \(w\) is an optional weight parameter for rescaling.Both the inputs and the targets are expected to be between 0 and 1.
Public Functions

Variable
forward
(const Variable &inputs, const Variable &targets, const Variable &weights)¶ Perform forward loss computation with an additional weight tensor.
 Parameters
inputs
: a tensor with the predicted valuestargets
: a tensor with the target valuesweights
: a rescaling weight given to the loss of each element.

std::string
prettyString
() const¶ Generates a stringified representation of the module.
 Return
a string containing the module label

Variable
CategoricalCrossEntropy¶

class
CategoricalCrossEntropy
: public fl::BinaryModule¶ Computes the categorical cross entropy loss between an input and a target tensor.
The input is expected to contain log probabilities (which can be accomplished via
LogSoftmax
). The targets should contain the index of the ground truth class for each input example.In the batch case, the output loss tensor \(\{l_1,...,l_N\}^\top\), put \(l_n = x_{n, y_n}\) (only consider the probability of the correct class). Then reduce via:
\[ \mathcal{L}(x, y) = \sum_{i = 1}^N l_i \]if using a sum reduction,\[ \mathcal{L}(x, y) = \frac{1}{n} \sum_{i = 1}^N l_i \]if using a mean reduction. If using no reduction (‘none’), the result will be reshaped to the target dimensions, giving a loss for each example. SeeReduceMode
.
MeanAbsoluteError¶

class
MeanAbsoluteError
: public fl::BinaryModule¶ Computes the mean absolute error (equivalent to the \(L_1\) loss):
\[ \mathcal{L}(x, y) = \frac{1}{n} \sum_{i = 0}^n \left x_i  y_i \right \]for input tensor \(x\) and target tensor \(y\) each of which contain \(n\) elements.Public Functions

std::string
prettyString
() const¶ Generates a stringified representation of the module.
 Return
a string containing the module label

std::string
MeanSquaredError¶

class
MeanSquaredError
: public fl::BinaryModule¶ Computes the mean squared error between elements across two tensors:
\[ \mathcal{L}(x, y) = \frac{1}{n} \sum_{i = 0}^n \left( x_i  y_i \right)^2 \]for input tensor \(x\) and target tensor \(y\) each of which contain \(n\) elements.Public Functions

std::string
prettyString
() const¶ Generates a stringified representation of the module.
 Return
a string containing the module label

std::string
Initialization¶
Copyright (c) Facebook, Inc.
and its affiliates. All rights reserved.
This source code is licensed under the BSDstyle license found in the LICENSE file in the root directory of this source tree.
Functions for initializing tensors.
Provides facilities for creating a fl::Variable
tensor of different types and initializations visavis probability distributions, constants, and the identity. Additionally wraps common tensors as integrated into modules.

namespace
fl
Copyright (c) Facebook, Inc.
and its affiliates. All rights reserved.
This source code is licensed under the BSDstyle license found in the LICENSE file in the root directory of this source tree.
and its affiliates. All rights reserved.
This source code is licensed under the BSDstyle license found in the LICENSE file in the root directory of this source tree. Logging is a light, multilevel, compile time filterable, logging infrastructure that is similar to glog in output format. It defines two logging macros, one for any logging and the other for more verbose logging. Compile time filter is applied separately to each of the two.
Output format: LMMDD HH:MM:SS.uuuuuu tid filename:##] Log message … L: Log level (Fatal, Critical, Error, Warning, Info) MMDD: month, day HH:MM:SS.uuuuuu: time (24hour format) with microseconds tid: thread ID filename:## the basename of the source file and line number of the LOG message
LOG use examples: LOG(INFO) << “foo bar n=” << 42; Output example: I0206 10:42:21.047293 87072 Logging.h:15 foo bar n=42 Note that LOG(level) only prints when level is <= from value set to Logging::setMaxLoggingLevel(level)
VLOG use example: VLOG(1) << “foo bar n=” << 42; Output example: vlog(1)0206 10:42:21.005439 87072 Logging.h:23 foo bar n=42 Note that VLOG(level) only prints when level is <= from value set to VerboseLogging::setMaxLoggingLevel(level)
and its affiliates. All rights reserved.
This source code is licensed under the BSDstyle license found in the LICENSE file in the root directory of this source tree. The configurable memory allocator is obtained by calling: std::unique_ptr<MemoryAllocator> CreateMemoryAllocator(config) Config defines a a set of allocators assembled in a CompositeMemoryAllocator.
Functions

Variable
input
(const af::array &arr)¶ Constructs a
Variable
with gradient calculation disabled, from a given array. Return
a
Variable
from the given array with gradient calculation disabled Parameters
arr
: anaf::array
to be used

Variable
noGrad
(const af::array &arr)¶ See
fl::input
above. Return
a
Variable
from the given array with gradient calculation disabled Parameters
arr
: anaf::array
to be used

Variable
param
(const af::array &arr)¶ Constructs a
Variable
with gradient calculation enabled, from a given array. Return
a
Variable
from the given array with gradient calculation enabled Parameters
arr
: anaf::array
to be used

Variable
kaimingUniform
(int input_size, int output_size, af::dtype type = f32, bool calc_grad = true)¶ Creates a
Variable
representing a tensor with dimensions[input_size, output_size]
where elements are uniformly distributed according to the method outlined in He et al (2015): Delving Deep into Rectifiers: Surpassing HumanLevel Performance on ImageNet Classification. Return
A
Variable
containing a tensor with random values distributed accordingly. Parameters
input_size
: the second dimension for the output tensor shapeoutput_size
: the first dimension of the output tensor shapetype
: the ArrayFire datatype for which to create the tensorcalc_grad
: flag denoting whether gradient calculation on the resultingVariable
should be enabled

Variable
kaimingUniform
(af::dim4 dims, af::dtype type = f32, bool calc_grad = true)¶ Creates a
Variable
representing a tensor of up to rank 4 with arbitrary dimensions, where elements are uniformly distributed according to the method outlined in He et al (2015): Delving Deep into Rectifiers: Surpassing HumanLevel Performance on ImageNet Classification.

Variable
kaimingNormal
(int input_size, int output_size, af::dtype type = f32, bool calc_grad = true)¶ Creates a
Variable
representing a tensor with dimensions[input_size, output_size]
where elements are normally distributed according to the method outlined in He et al (2015): Delving Deep into Rectifiers: Surpassing HumanLevel Performance on ImageNet Classification. Return
A
Variable
containing a tensor with random values distributed accordingly. Parameters
input_size
: the second dimension for the output tensor shapeoutput_size
: the first dimension of the output tensor shapetype
: the ArrayFire datatype for which to create the tensorcalc_grad
: flag denoting whether gradient calculation on the resultingVariable
should be enabled

Variable
kaimingNormal
(af::dim4 dims, af::dtype type = f32, bool calc_grad = true)¶ Creates a
Variable
representing a tensor of up to rank 4 with arbitrary dimensions, where elements are normally distributed according to the method outlined in He et al (2015): Delving Deep into Rectifiers: Surpassing HumanLevel Performance on ImageNet Classification.

Variable
glorotUniform
(int input_size, int output_size, af::dtype type = f32, bool calc_grad = true)¶ Creates a
Variable
representing a tensor with dimensions[input_size, output_size]
where elements are uniformly distributed according to the method outlined in Glorot and Bengio (2010): Understanding the difficulty of training deep feedforward neural networks. Return
A
Variable
containing a tensor with random values distributed accordingly. Parameters
input_size
: the second dimension for the output tensor shapeoutput_size
: the first dimension of the output tensor shapetype
: the ArrayFire datatype for which to create the tensorcalc_grad
: flag denoting whether gradient calculation on the resultingVariable
should be enabled

Variable
glorotUniform
(af::dim4 dims, af::dtype type = f32, bool calc_grad = true)¶ Creates a
Variable
representing a tensor of up to rank 4 with arbitrary dimensions, where elements are uniformly distributed according to the method outlined in Glorot and Bengio (2010): Understanding the difficulty of training deep feedforward neural networks.

Variable
glorotNormal
(int input_size, int output_size, af::dtype type = f32, bool calc_grad = true)¶ Creates a
Variable
representing a tensor with dimensions[input_size, output_size]
where elements are normally distributed according to the method outlined in Glorot and Bengio (2010): Understanding the difficulty of training deep feedforward neural networks. Return
A
Variable
containing a tensor with random values distributed accordingly. Parameters
input_size
: the second dimension for the output tensor shapeoutput_size
: the first dimension of the output tensor shapetype
: the ArrayFire datatype for which to create the tensorcalc_grad
: flag denoting whether gradient calculation on the resultingVariable
should be enabled

Variable
glorotNormal
(af::dim4 dims, af::dtype type = f32, bool calc_grad = true)¶ Creates a
Variable
representing a tensor of up to rank 4 with arbitrary dimensions, where elements are normally distributed according to the method outlined in Glorot and Bengio (2010): Understanding the difficulty of training deep feedforward neural networks.

Variable
Utils¶
Copyright (c) Facebook, Inc.
and its affiliates. All rights reserved.
This source code is licensed under the BSDstyle license found in the LICENSE file in the root directory of this source tree.
Utils for modules.

namespace
fl
Copyright (c) Facebook, Inc.
and its affiliates. All rights reserved.
This source code is licensed under the BSDstyle license found in the LICENSE file in the root directory of this source tree.
and its affiliates. All rights reserved.
This source code is licensed under the BSDstyle license found in the LICENSE file in the root directory of this source tree. Logging is a light, multilevel, compile time filterable, logging infrastructure that is similar to glog in output format. It defines two logging macros, one for any logging and the other for more verbose logging. Compile time filter is applied separately to each of the two.
Output format: LMMDD HH:MM:SS.uuuuuu tid filename:##] Log message … L: Log level (Fatal, Critical, Error, Warning, Info) MMDD: month, day HH:MM:SS.uuuuuu: time (24hour format) with microseconds tid: thread ID filename:## the basename of the source file and line number of the LOG message
LOG use examples: LOG(INFO) << “foo bar n=” << 42; Output example: I0206 10:42:21.047293 87072 Logging.h:15 foo bar n=42 Note that LOG(level) only prints when level is <= from value set to Logging::setMaxLoggingLevel(level)
VLOG use example: VLOG(1) << “foo bar n=” << 42; Output example: vlog(1)0206 10:42:21.005439 87072 Logging.h:23 foo bar n=42 Note that VLOG(level) only prints when level is <= from value set to VerboseLogging::setMaxLoggingLevel(level)
and its affiliates. All rights reserved.
This source code is licensed under the BSDstyle license found in the LICENSE file in the root directory of this source tree. The configurable memory allocator is obtained by calling: std::unique_ptr<MemoryAllocator> CreateMemoryAllocator(config) Config defines a a set of allocators assembled in a CompositeMemoryAllocator.
Functions

bool
allParamsClose
(const Module &a, const Module &b, double absTolerance = 1e5)¶ Returns true if the parameters of two modules are of same type and are elementwise equal within given tolerance limit.
 Parameters
[ab]
: input Modules to compareabsTolerance
: absolute tolerance allowed

int
derivePadding
(int inSz, int filterSz, int stride, int pad, int dilation)¶

af::array
join
(const std::vector<af::array> &inputs, double padValue = 0.0, dim_t batchDim = 1)¶ packs a list of arrays (possibly of different dimensions) to a single array by padding them to same dimensions

bool
DistributedUtils¶

namespace
fl
Copyright (c) Facebook, Inc.
and its affiliates. All rights reserved.
This source code is licensed under the BSDstyle license found in the LICENSE file in the root directory of this source tree.
and its affiliates. All rights reserved.
This source code is licensed under the BSDstyle license found in the LICENSE file in the root directory of this source tree. Logging is a light, multilevel, compile time filterable, logging infrastructure that is similar to glog in output format. It defines two logging macros, one for any logging and the other for more verbose logging. Compile time filter is applied separately to each of the two.
Output format: LMMDD HH:MM:SS.uuuuuu tid filename:##] Log message … L: Log level (Fatal, Critical, Error, Warning, Info) MMDD: month, day HH:MM:SS.uuuuuu: time (24hour format) with microseconds tid: thread ID filename:## the basename of the source file and line number of the LOG message
LOG use examples: LOG(INFO) << “foo bar n=” << 42; Output example: I0206 10:42:21.047293 87072 Logging.h:15 foo bar n=42 Note that LOG(level) only prints when level is <= from value set to Logging::setMaxLoggingLevel(level)
VLOG use example: VLOG(1) << “foo bar n=” << 42; Output example: vlog(1)0206 10:42:21.005439 87072 Logging.h:23 foo bar n=42 Note that VLOG(level) only prints when level is <= from value set to VerboseLogging::setMaxLoggingLevel(level)
and its affiliates. All rights reserved.
This source code is licensed under the BSDstyle license found in the LICENSE file in the root directory of this source tree. The configurable memory allocator is obtained by calling: std::unique_ptr<MemoryAllocator> CreateMemoryAllocator(config) Config defines a a set of allocators assembled in a CompositeMemoryAllocator.
Functions
Registers a module for allreduce synchronization with a gradient hook on it parameter Variables.
 Parameters
[in] module
: a module whose parameter gradients will be synchronized[in] a
:Reducer
instance to which gradients will be immediately added when available
Traverses the network and averages its parameters with allreduce.
 Parameters
module
: a module whose parameters will be synchronized
Traverses the network and synchronizes the gradients of its parameters with allreduce.
 Parameters
module
: a module whose parameter gradients will be synchronizedscale
: scale gradients after allreduce by this factor