Modules

Containers

Module

class Module

An abstract computation unit capable of forward computation.

Also contains a collection of parameters that can be mutated, and will be serialized and deserialized with the module.

Subclassed by fl::BinaryModule, fl::Container, fl::RNN, fl::UnaryModule, fl::WeightNorm

Public Functions

std::vector<Variable> params() const

Gets the parameters of the module.

Return

the modules parameters as a vector of Variable

virtual void train()

Switches the module to training mode.

Changes all parameters so that gradient calculation will be enabled for any calls to forward.

virtual void eval()

Switches the module to evaluation mode.

Changes all parameters so that gradient calculation will be disabled for any calls to forward.

Variable param(int position) const

Returns a module parameter given a particular position.

Return

a Variable tensor for the parameter at the requested position

Parameters
  • position: the index of the requested parameter in params_

virtual void setParams(const Variable &var, int position)

Sets a parameter at a specified position with a new, given one.

If the specified position is not valid (it is negative or greater than params_.size() - 1), then an error will be thrown. A new parameter will not be created at a specified index if out of bounds.

Parameters
  • var: the new replacement Variable

  • position: The index of the parameter which will be replaced in params_

void zeroGrad()

Clears references to gradient Variables for all parameters in the module.

virtual std::vector<Variable> forward(const std::vector<Variable> &inputs) = 0

Performs forward computation for the module, given some inputs.

Return

a vector of Variable tensors containing the result of the forward computation

Parameters
  • inputs: the values to compute forward computation for the module.

std::vector<Variable> operator()(const std::vector<Variable> &inputs)

Overload for forward computation for the module.

Return

a vector of Variable tensors containing the result of the forward computation

Parameters
  • inputs: the values to compute forward computation for the module.

virtual std::string prettyString() const = 0

Generates a stringified representation of the module.

Return

a string containing the module label

virtual ~Module()
class UnaryModule : public fl::Module

An extension of Module which supports only forward computation on a single Variable with a single Variable as output.

For example, Sigmoid module can be derived from UnaryModule.

Subclassed by fl::AdaptiveSoftMax, fl::BatchNorm, fl::Conv2D, fl::Dropout, fl::ELU, fl::Embedding, fl::GatedLinearUnit, fl::HardTanh, fl::LayerNorm, fl::LeakyReLU, fl::Linear, fl::Log, fl::LogSoftmax, fl::Padding, fl::Pool2D, fl::PReLU, fl::ReLU, fl::ReLU6, fl::Reorder, fl::Sigmoid, fl::Tanh, fl::ThresholdReLU, fl::Transform, fl::View

Public Functions

UnaryModule()
UnaryModule(const std::vector<Variable> &params)
std::vector<Variable> forward(const std::vector<Variable> &inputs)

Performs forward computation for the module, given some inputs.

Return

a vector of Variable tensors containing the result of the forward computation

Parameters
  • inputs: the values to compute forward computation for the module.

virtual Variable forward(const Variable &input) = 0
Variable operator()(const Variable &input)
virtual ~UnaryModule()
class BinaryModule : public fl::Module

An extension of Module which supports only forward computation on a pair of Variables with a single Variable as output.

For example, BinaryCrossEntropy Loss can be derived from BinaryModule.

Subclassed by fl::AdaptiveSoftMaxLoss, fl::BinaryCrossEntropy, fl::CategoricalCrossEntropy, fl::MeanAbsoluteError, fl::MeanSquaredError

Public Functions

BinaryModule()
BinaryModule(const std::vector<Variable> &params)
std::vector<Variable> forward(const std::vector<Variable> &inputs)

Performs forward computation for the module, given some inputs.

Return

a vector of Variable tensors containing the result of the forward computation

Parameters
  • inputs: the values to compute forward computation for the module.

virtual Variable forward(const Variable &input1, const Variable &input2) = 0
Variable operator()(const Variable &input1, const Variable &input2)
virtual ~BinaryModule()

Container

class Container : public fl::Module

A computation unit capable of forward computation that contains a collection of multiple Module and their respective parameters.

Subclassed by fl::PositionEmbedding, fl::Residual, fl::Sequential, fl::Transformer

Public Functions

template<typename T>
void add(const T &module)

Adds a module to a Container by making a copy of the underlying module.

Note that parameters are still shared, due to Variable’s copy semantics.

Parameters
  • module: the module to add.

template<typename T>
void add(std::shared_ptr<T> module)

Adds a module to modules_, and adds parameters to the container’s params_.

Parameters
  • module: the module to add.

ModulePtr module(int id) const

Returns a pointer to the module at the specified index in the container’s modules_.

Return

a pointer to the requested module

Parameters
  • id: the index of the module to return

std::vector<ModulePtr> modules() const

Returns pointers to each of Module in the Container.

Return

an ordered vector of pointers for each module.

void train()

Switches all modules in the Container into train mode.

See Module.

void eval()

Switches all modules in the Container into eval mode.

See Module.

void setParams(const Variable &var, int position)

Sets a parameter at a specified position with a new, given one.

If the specified position is not valid (it is negative or greater than params_.size() - 1), then an error will be thrown. A new parameter will not be created at a specified index if out of bounds.

Parameters
  • var: the new replacement Variable

  • position: The index of the parameter which will be replaced in params_

Sequential

class Sequential : public fl::Container

A Container representing an ordered sequence of modules, which is capable of forward computation through each of its modules, in order.

Usage:

Sequential mySequential();
// Assume we've defined implemented three modules, mod1, mod2, mod3
mySequential.add(mod1);
mySequential.add(mod2);
mySequential.add(mod3);
// Performing forward computation will forward through each `Module` in order
auto result = mySequential.forward(myInput);
// We can also inspect internal state
assert(mySequential.modules().size() == 3); // true
assert(
    mod1.params().size() + mod2.params.size() +
        mod3.params().size() ==
    mySequential.params().size()); // true

Public Functions

Sequential()
std::vector<Variable> forward(const std::vector<Variable> &input)

Performs forward computation for the Sequential, calling forward, in order, for each Module, and feeding the result as input to the next Module.

Return

a Variable tensor containing the result of the forward computation

Parameters
  • input: the value on which the Container will perform forward computation.

Variable forward(const Variable &input)
Variable operator()(const Variable &input)
std::string prettyString() const

Generates a stringified representation of the Sequential by concatenating string representations for each contained Module

Return

a string containing the module label

Layers

Activations

class Sigmoid : public fl::UnaryModule

Applies the sigmoid function element-wise to a Variable:

\[\text{sigmoid}(x) = \frac{1}{1 + e^{-x}}\]
.

Public Functions

std::string prettyString() const

Generates a stringified representation of the module.

Return

a string containing the module label

class Tanh : public fl::UnaryModule

Applies the hyperbolic tangent function element-wise to a Variable:

\[\text{tanh}(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\]
.

Public Functions

std::string prettyString() const

Generates a stringified representation of the module.

Return

a string containing the module label

class HardTanh : public fl::UnaryModule

Applies the hard-tanh function element-wise to a Variable:

\[\begin{split}\text{HardTanh}(x) = \begin{cases} 1 & \text{ if } x > 1 \\ -1 & \text{ if } x < -1 \\ x & \text{ otherwise } \\ \end{cases} \end{split}\]
.

Public Functions

std::string prettyString() const

Generates a stringified representation of the module.

Return

a string containing the module label

class ReLU : public fl::UnaryModule

Applies the rectified linear unit function element-wise to a Variable:

\[ ReLU(x) = \max(0, x) \]
.

Public Functions

std::string prettyString() const

Generates a stringified representation of the module.

Return

a string containing the module label

class LeakyReLU : public fl::UnaryModule

Applies the leaky rectified linear unit function from Maas et al (2013), Rectifier Nonlinearities Improve Neural Network Acoustic Models.

Applied function element-wise to a Variable:

\[\begin{split} \text{LeakyRELU}(x) = \begin{cases} x, & \text{ if } x \geq 0 \\ \text{slope} \times x, & \text{ otherwise } \end{cases} \end{split}\]
where \(\text{slope}\) is a constant by which the input will be multiplied if less than zero.

class PReLU : public fl::UnaryModule

Applies the pramaeterized rectified linear unit function from He et al (2015), Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.

Applied element-wise to a Variable, given some input size:

\[\begin{split} \text{PReLU}(x) = \begin{cases} x, & \text{ if } x \geq 0 \\ \text{value} \times x, & \text{ otherwise } \end{cases} \end{split}\]
where \(\text{value}\) is a learned parameter whose initialization can be tuned.

class ELU : public fl::UnaryModule

Applies the exponential linear unit function from Clevert et al (2015): Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs).

Applied element-wise to a Variable:

\[\begin{split} \text{ELU}(x) = \begin{cases} x & \text{ if } x \geq 0 \\ \alpha \times (e^x - 1) & \text{ otherwise } \end{cases} \end{split}\]
where \(\alpha\) is a tunable parameter.

class ThresholdReLU : public fl::UnaryModule

Applies the threshold rectified linear unit from Konda et al (2015): Zero-bias autoencoders and the benefits of co-adapting features.

Applied element-wise to a Variable:

\[\begin{split} \text{ThresholdReLU}(x) = \begin{cases} x & \text{ if } x > \text{threshold} \\ 0 & \text{ otherwise } \end{cases} \end{split}\]
where \(\text{threshold}\) is a tunable parameter.

class GatedLinearUnit : public fl::UnaryModule

Creates a Gated Linear Unit from Dauphin et al (2017): Language Modeling with Gated Convolutional Networks.

\[\text{GLU}(x) = x_i \otimes \sigma(x_j)\]
where \(\otimes\) denotes the element-wise product \(x_i\) is the first half of the input, \(x_j\) is the second half, and \(\sigma(x)\) is the sigmoid function.

class LogSoftmax : public fl::UnaryModule

Applies the log softmax function to a tensor:

\[ \text{LogSoftmax}(x_i) = \log{\left (\frac{e^{x_i} }{ \sum_j e^{x_j}} \right)} \]
.

class Log : public fl::UnaryModule

Applies the natural logarithm element-wise to a Variable.

Public Functions

std::string prettyString() const

Generates a stringified representation of the module.

Return

a string containing the module label

BatchNorm

class BatchNorm : public fl::UnaryModule

Applies Batch Normalization on a given input as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.

The operation implemented is:

\[out(x) = \frac{x - E[x]}{\sqrt{Var[x]+\epsilon}} \times \gamma + \beta \]
where \(E[x]\) and \(Var[x]\) are the mean and variance of the input \(x\) calculated over the specified axis, \(\epsilon\) is a small value added to the variance to avoid divide-by-zero, and \(\gamma\) and \(\beta\) are learnable parameters for affine transformation.

Public Functions

BatchNorm(int featAxis, int featSize, double momentum = 0.1, double eps = 1e-5, bool affine = true, bool trackStats = true)

Constructs a BatchNorm module.

Parameters
  • featAxis: the axis over which normalizationis performed

  • featSize: the size of the dimension along featAxis

  • momentum: an exponential average factor used to compute running mean and variance.

    \[ runningMean = runningMean \times (1-momentum) + newMean \times momentum \]
    If < 0, cumulative moving average is used.

  • eps: \(\epsilon\)

  • affine: a boolean value that controls the learning of \(\gamma\) and \(\beta\). \(\gamma\) and \(\beta\) are set to 1, 0 respectively if set to false, or initialized as learnable parameters if set to true.

  • trackStats: a boolean value that controls whether to track the running mean and variance while in train mode. If false, batch statistics are used to perform normalization in both train and eval mode.

BatchNorm(const std::vector<int> &featAxis, int featSize, double momentum = 0.1, double eps = 1e-5, bool affine = true, bool trackStats = true)

Constructs a BatchNorm module.

Parameters
  • featAxis: the axis over which normalization is performed

  • featSize: total dimension along featAxis. For example, to perform Temporal Batch Normalization on input of size [ \(L\), \(C\), \(N\)], use featAxis = {1}, featSize = \(C\). To perform normalization per activation on input of size [ \(W\), \(H\), \(C\), \(N\)], use featAxis = {0, 1, 2}, featSize = \(W \times H \times C\).

  • momentum: an exponential average factor used to compute running mean and variance.

    \[ runningMean = runningMean \times (1-momentum) + newMean \times momentum \]
    If < 0, cumulative moving average is used.

  • eps: \(\epsilon\)

  • affine: a boolean value that controls the learning of \(\gamma\) and \(\beta\). \(\gamma\) and \(\beta\) are set to 1, 0 respectively if set to false, or initialized as learnable parameters if set to true.

  • trackStats: a boolean value that controls whether to track the running mean and variance while in train mode. If false, batch statistics are used to perform normalization in both train and eval mode.

std::string prettyString() const

Generates a stringified representation of the module.

Return

a string containing the module label

Conv2D

class Conv2D : public fl::UnaryModule

Applies a 2D convolution over an 4D input along its first two dimensions.

This layer expects an input of shape [ \(X_{in}\), \(Y_{in}\), \(C_{in}\), \(N\)] where \(C_{in}\) is the number of input channels, and generates an output of shape [ \(X_{out}\), \(Y_{out}\), \(C_{out}\), \(N\)] where \(C_{out}\) is the number of output channels,

\[X_{out} = \frac{X_{in} + 2 \times X_{pad} - (1 + (X_{filter} - 1) \times X_{dilation})}{X_{stride}} + 1\]
\[Y_{out} = \frac{Y_{in} + 2 \times Y_{pad} - (1 + (Y_{filter} - 1) \times Y_{dilation})}{Y_{stride}} + 1\]

Two modes for zero-padding are supported:

  • AF_PADDING_NONE: no padding

  • AF_PADDING_SAME: \(X_{pad}\) and \(Y_{pad}\) are dynamically chosen so that

    \[X_{out} = \lceil{\frac{X_{in}}{X_{stride}}}\rceil, Y_{out} = \lceil{\frac{Y_{in}}{Y_{stride}}}\rceil\]

Subclassed by fl::AsymmetricConv1D

Public Functions

Conv2D(int n_in, int n_out, int wx, int wy, int sx = 1, int sy = 1, detail::IntOrPadMode px = 0, detail::IntOrPadMode py = 0, int dx = 1, int dy = 1, bool bias = true, int groups = 1)

Constructs a Conv2D module.

Parameters
  • n_in: \(C_{in}\), the number of channels in the input

  • n_out: \(C_{out}\), the number of channels in the output

  • wx: the size of the first dimension of the convolving kernel

  • wy: the size of the second dimension of the convolving kernel

  • sx: the stride of the convolution along the first dimension

  • sy: the stride of the convolution along the second dimension

  • px: the amount of zero-padding added to the both sides of the first dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingMode

  • py: the amount of zero-padding added to the both sides of the second dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingMode

  • dx: dilation of the convolution along the first kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.

  • dy: dilation of the convolution along the second kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.

  • bias: a boolean value that controls whether to add a learnable bias to the output

  • groups: the number of groups that the input and output channels are divided into for restricting the connectivity between input and output channels. If groups > 1, the the output channels in the i-th group will be only connected to the input channels in the i-th group

Conv2D(const Variable &w, int sx = 1, int sy = 1, detail::IntOrPadMode px = 0, detail::IntOrPadMode py = 0, int dx = 1, int dy = 1, int groups = 1)

Constructs a Conv2D module with a kernel Variable tensor.

No bias term will be applied to the output.

Parameters
  • w: the kernel Variable tensor. The shape should be [ \(kerneldim_0\), \(kerneldim_1\), \(C_{in}\), \(C_{out}\)].

  • sx: the stride of the convolution along the first dimension

  • sy: the stride of the convolution along the second dimension

  • px: the amount of zero-padding added to the both sides of the first dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingMode

  • py: the amount of zero-padding added to the both sides of the second dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingMode

  • dx: dilation of the convolution along the first kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.

  • dy: dilation of the convolution along the second kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.

  • groups: the number of groups that the input and output channels are divided into for restricting the connectivity between input and output channels. If groups > 1, the the output channels in the i-th group will be only connected to the input channels in the i-th group.

Conv2D(const Variable &w, const Variable &b, int sx = 1, int sy = 1, detail::IntOrPadMode px = 0, detail::IntOrPadMode py = 0, int dx = 1, int dy = 1, int groups = 1)

Constructs a Conv2D module with a kernel Variable tensor and a bias Variable tensor.

Parameters
  • w: the kernel Variable tensor. The shape should be [ \(kerneldim_0\), \(kerneldim_1\), \(C_{in}\), \(C_{out}\)].

  • b: the bias Variable tensor. The shape should be [ \(1\), \(1\), \(C_{out}\), \(1\)].

  • sx: the stride of the convolution along the first dimension

  • sy: the stride of the convolution along the second dimension

  • px: the amount of zero-padding added to the both sides of the first dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingMode

  • py: the amount of zero-padding added to the both sides of the second dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingMode

  • dx: dilation of the convolution along the first kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.

  • dy: dilation of the convolution along the second kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.

  • groups: the number of groups that the input and output channels are divided into for restricting the connectivity between input and output channels. If groups > 1, the the output channels in the i-th group will be only connected to the input channels in the i-th group.

std::string prettyString() const

Generates a stringified representation of the module.

Return

a string containing the module label

Dropout

class Dropout : public fl::UnaryModule

Implements Dropout normalization, as given by Hinton et al (2012): Improving neural networks by preventing co-adaptation of feature detectors.

Effectively regularizes by randomly zeroing out values in the input based on a given ratio.

All values that are not zeroed out are scaled by a factor of \(\frac{1}{1 - p}\). Thus, with the same network, at test time, evaluating the module gives the identity.

Embedding

class Embedding : public fl::UnaryModule

Looks up embeddings from a learnable dictionary of fixed size.

This layer expects as input a list of indices with at most three dimensions, [ \(B_1\), \(B_2\) (optional), \(B_3\) (optional)], and generates an output from lookup of shape [embedding_dim, \(B_1\), \(B_2\) (optional), \(B_3\) (optional)].

Public Functions

Embedding(int embedding_dim, int num_embeddings)

Constructs an Embedding module.

Parameters
  • embedding_dim: the size of each embedding vector

  • num_embeddings: the size of the dictionary of embeddings

std::string prettyString() const

Generates a stringified representation of the module.

Return

a string containing the module label

LayerNorm

class LayerNorm : public fl::UnaryModule

Applies Layer Normalization on a given input as described in the paper Layer Normalization.

The operation implemented is:

\[out(x) = \frac{x - E[x]}{\sqrt{Var[x]+\epsilon}} \times \gamma + \beta \]
where \(E[x]\) and \(Var[x]\) are the mean and variance of the input \(x\) calculated along specified axis, \(\epsilon\) is a small value added to the variance to avoid divide-by-zero, and \(\gamma\) and \(\beta\) are learnable parameters for affine transformation.

Public Functions

LayerNorm(int axis, double eps = 1e-5, bool affine = true, int axisSize = kLnVariableAxisSize)

Constructs a LayerNorm module.

Parameters
  • axis: the axis along which normalization is computed. Usually set as the feature axis.

  • eps: \(\epsilon\)

  • affine: a boolean value that controls the learning of \(\gamma\) and \(\beta\). \(\gamma\) and \(\beta\) are set to 1, 0 respectively if set to false, or initialized as learnable parameters if set to true.

  • axisSize: size of features specified by axis to perform elementwise affine transform. If the feat size is variable, use kLnVariableAxisSize which uses singleton weight, bias and tiles them dynamically according to the given input.

LayerNorm(const std::vector<int> &axis, double eps = 1e-5, bool affine = true, int axisSize = kLnVariableAxisSize)

Constructs a LayerNorm module.

Parameters
  • axis: the axis along which normalization is computed. Usually set as the feature axis.

  • eps: \(\epsilon\)

  • affine: a boolean value that controls the learning of \(\gamma\) and \(\beta\). \(\gamma\) and \(\beta\) are set to 1, 0 respectively if set to false, or initialized as learnable parameters if set to true.

  • featSize: size of features specified by axis to perform elementwise affine transform. If the feat size is variable, use kLnVariableAxisSize which uses singleton weight, bias and tiles them dynamically according to the given input.

std::string prettyString() const

Generates a stringified representation of the module.

Return

a string containing the module label

Linear

class Linear : public fl::UnaryModule

Applies linear transformation on input: \(y = Wx + b \).

This layer takes in an input of shape [input_size, *, *, *] and transforms it to an output of shape [output_size, *, *, *].

Public Functions

Linear(int input_size, int output_size, bool bias = true)

Constructs a Linear module from the input and output sample sizes.

Parameters
  • input_size: the size of each input sample

  • output_size: the size of each output sample

  • bias: a boolean value that controls whether the layer will include a bias term \(b\).

Linear(const Variable &w)

Constructs a Linear module from the weight parameter \(w\).

The layer will not include the bias term \(b\) in this case.

Parameters
  • w: the 2D Variable tensor for the weight \(w\). The shape should be [output_size, input_size].

Linear(const Variable &w, const Variable &b)

Constructs a Linear module from the weight parameter \(w\) and the bias parameter \(b\).

Parameters
  • w: the 2D Variable tensor for the weight \(w\). The shape should be [output_size, input_size].

  • b: the 1D Variable tensor for the bias \(b\). The shape should be [output_size].

std::string prettyString() const

Generates a stringified representation of the module.

Return

a string containing the module label

Padding

class Padding : public fl::UnaryModule

Adds a padding of value val before and after each dimension \(i\) of size specified by the tuple padi to the input.

Pool2D

class Pool2D : public fl::UnaryModule

A 2D pooling layer.

This layer expects an input of shape [ \(X_{in}\), \(Y_{in}\), \(C\), \(N\)]. Pooling (max or average) is performed over the first and second dimensions of the input. Thus the output will be of shape [ \(X_{out}\), \(Y_{out}\), \(C\), \(N\)].

Reorder

class Reorder : public fl::UnaryModule

Reorders the data according to the specified dimensions.

The order of the data may change and is guaranteed to be contiguous in memory.

// A layer which transposes a matrix
auto transposeLayer = Reorder(1, 0);

auto var = Variable(af::array(1, 2, 3, 4), false);

// Make the last dimension the first dimension
var = Reorder(3, 0, 1, 2)(var);
// Dims will be {4, 1, 2, 3}
std::cout << var.dims() << std::endl;

RNN

class RNN : public fl::Module

A recurrent neural network (RNN) layer.

The RNN layer supports several cell types. The most basic RNN (e.g. an Elman network) computes the following function:

\[ h_t = \sigma(W x_t + U h_{t-1} + b) \]
If the RNN mode is RELU then \(\sigma\) will be a ReLU. If the RNN mode is TANH then it will be a Tanh function.

Gated Recurrent Units (GRU) are supported. For details see the original GRU paper or the Wikipedia page.

LSTM cells are also supported (LSTM). The LSTM cell uses a forget gate and does not have peephole connections. For details see the original paper Long Short-Term Memory or the Wikipedia page.

The input to the RNN is expected to be of shape [ \(X_{in}\), \(N\), \(T\)] where \(N\) is the batch size and \(T\) is the sequence length.

The output of the RNN is will be of shape [ \(X_{out}\), \(N\), \(T\)]. Here \(X_{out}\) will be hidden_size if the RNN is unidirectional and it will be twice the hidden_size if the RNN is bidirectional.

In addition the RNN supports including the hidden state and the cell state as input and output. When these are input as the empty Variable they are assumed to be zero.

Public Functions

RNN(int input_size, int hidden_size, int num_layers, RnnMode mode, bool bidirectional = false, float drop_prob = 0.0)

Construct an RNN layer.

Parameters
  • input_size: The dimension of the input (e.g. \(X_{in}\))

  • hidden_size: The hidden dimension of the RNN.

  • num_layers: The number of recurrent layers.

  • mode: The RNN mode to use. Can be any of:

    • RELU

    • TANH

    • LSTM

    • GRU

  • bidirectional: Whether or not the RNN is bidirectional. If true the output dimension will be doubled.

  • drop_prob: The probability of dropout after each RNN layer except the last layer.

std::vector<Variable> forward(const std::vector<Variable> &inputs)

Performs forward computation for the module, given some inputs.

Return

a vector of Variable tensors containing the result of the forward computation

Parameters
  • inputs: the values to compute forward computation for the module.

Variable forward(const Variable &input)

Forward the RNN Layer.

Return

a single output Variable with shape [ \(X_{out}\), \(N\), \(T\)]

Parameters
  • input: Should be of shape [ \(X_{in}\), \(N\), \(T\)]

std::tuple<Variable, Variable> forward(const Variable &input, const Variable &hidden_state)

Forward the RNN Layer.

Return

An tuple of output Variables.

  • The first element is the output of the RNN of shape [ \(X_{out}\), \(N\), \(T\)]

  • The second element is the hidden state of the RNN of shape [ \(X_{out}\), \(N\)]

Parameters
  • input: Should be of shape [ \(X_{in}\), \(N\), \(T\)]

  • hidden_state: Should be of shape [ \(X_{out}\), \(N\)]. If an empty Variable is passed in then the hidden state is assumed zero.

std::tuple<Variable, Variable, Variable> forward(const Variable &input, const Variable &hidden_state, const Variable &cell_state)

Forward the RNN Layer.

Return

An tuple of output Variables.

  • The first element is the output of the RNN of shape [ \(X_{out}\), \(N\), \(T\)]

  • The second element is the hidden state of the RNN of shape [ \(X_{out}\), \(N\)]

  • The third element is the cell state of the RNN of shape [ \(X_{out}\), \(N\)]

Parameters
  • input: Should be of shape [ \(X_{in}\), \(N\), \(T\)]

  • hidden_state: Should be of shape [ \(X_{out}\), \(N\)]. If an empty Variable is passed in then the hidden state is assumed zero.

  • cell_state: Should be of shape [ \(X_{out}\), \(N\)]. If an empty Variable is passed in then the hidden state is assumed zero.

std::string prettyString() const

Generates a stringified representation of the module.

Return

a string containing the module label

Transform

class Transform : public fl::UnaryModule

Applies a transformation on the input specified by a lambda function.

For example to add a \( 1 + log(x) \) layer to a container:

model.add(
  Transform([](const Variable& in) {
    return 1 + afnet::log(in);
  }
);
Note this module cannot be serialized.

View

class View : public fl::UnaryModule

Modifies the dimensions of a Variable and rearranges its elements without modifying the order of elements in the underlying af::array.

When specifying the number of elements in the array:

  • If -1 is specified on a particular axis, that axis will be assigned a dimension based on the number of total elements in the tensor. Only one axis value can be -1.

  • If 0 is specified on a particular axis, that axis will have the same dimension as does the input tensor. For example: given an input tensor with shape (10, 20, 30, 40) and a View with shape (-1, 0, 100), the output tensor will have shape (120, 20, 100).

WeightNorm

class WeightNorm : public fl::Module

A weight normalization layer.

This layer wraps a given module to create a weight normalized implementation of the module. WeightNorm currently supports Linear and Conv2D. For example:

WeightNorm wn(Linear(128, 128), 0);

For more details see Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

Losses

AdaptiveSoftMaxLoss

class AdaptiveSoftMaxLoss : public fl::BinaryModule

An efficient approximation of the softmax function and negative log-likelihood loss.

Computes the Adaptive Softmax, as given by Grave et al (2017): Efficient softmax approximation for GPUs. Efficient when the number of classes over which the softmax is being computed is very high and the label distribution is highly imbalanced.

Adaptive softmax buckets the inputs based on their frequency, where clusters may be different number of targets each. For each minibatch, only clusters for which at least one target is present are evaluated. Forward pass for low-frequency inputs are approximated with lower rank matrices so as to speed up computation.

Public Functions

AdaptiveSoftMaxLoss(std::shared_ptr<AdaptiveSoftMax> activation, ReduceMode reduction = ReduceMode::MEAN)

Create an AdaptiveSoftMaxLoss with given parameters.

Parameters
  • reduction: the reduction mode - see ReductionMode See documentation on ReduceMode for available options.

Variable forward(const Variable &inputs, const Variable &targets)

Computes the categorical cross entropy loss for some input and target tensors (uses adaptive softmax function to do this efficiently)

Parameters
  • inputs: a Variable with shape [ \(C\), \(B_1\), \(B_2\), \(B_3\)] where \(C\) is the number of classes.

  • targets: an integer Variable with shape [ \(B_1\), \(B_2\), \(B_3\)]. The values must be in [ \(0\), \(C - 1\)]

void setParams(const Variable &var, int position)

Sets a parameter at a specified position with a new, given one.

If the specified position is not valid (it is negative or greater than params_.size() - 1), then an error will be thrown. A new parameter will not be created at a specified index if out of bounds.

Parameters
  • var: the new replacement Variable

  • position: The index of the parameter which will be replaced in params_

std::string prettyString() const

Generates a stringified representation of the module.

Return

a string containing the module label

BinaryCrossEntropy

class BinaryCrossEntropy : public fl::BinaryModule

Computes the binary cross entropy loss between an input tensor \(x\) and a target tensor \(y\).

The binary cross entropy loss is:

\[ B(x, y) = \frac{1}{n} \sum_{i = 0}^n -\left( w_i \times (y_i \times \log(x_i) + (1 - y_i) \times \log(1 - x_i)) \right) \]
where \(w\) is an optional weight parameter for rescaling.

Both the inputs and the targets are expected to be between 0 and 1.

Public Functions

Variable forward(const Variable &inputs, const Variable &targets, const Variable &weights)

Perform forward loss computation with an additional weight tensor.

Parameters
  • inputs: a tensor with the predicted values

  • targets: a tensor with the target values

  • weights: a rescaling weight given to the loss of each element.

std::string prettyString() const

Generates a stringified representation of the module.

Return

a string containing the module label

CategoricalCrossEntropy

class CategoricalCrossEntropy : public fl::BinaryModule

Computes the categorical cross entropy loss between an input and a target tensor.

The input is expected to contain log probabilities (which can be accomplished via LogSoftmax). The targets should contain the index of the ground truth class for each input example.

In the batch case, the output loss tensor \(\{l_1,...,l_N\}^\top\), put \(l_n = -x_{n, y_n}\) (only consider the probability of the correct class). Then reduce via:

\[ \mathcal{L}(x, y) = \sum_{i = 1}^N l_i \]
if using a sum reduction,
\[ \mathcal{L}(x, y) = \frac{1}{n} \sum_{i = 1}^N l_i \]
if using a mean reduction. If using no reduction (‘none’), the result will be reshaped to the target dimensions, giving a loss for each example. See ReduceMode.

MeanAbsoluteError

class MeanAbsoluteError : public fl::BinaryModule

Computes the mean absolute error (equivalent to the \(L_1\) loss):

\[ \mathcal{L}(x, y) = \frac{1}{n} \sum_{i = 0}^n \left| x_i - y_i \right| \]
for input tensor \(x\) and target tensor \(y\) each of which contain \(n\) elements.

Public Functions

std::string prettyString() const

Generates a stringified representation of the module.

Return

a string containing the module label

MeanSquaredError

class MeanSquaredError : public fl::BinaryModule

Computes the mean squared error between elements across two tensors:

\[ \mathcal{L}(x, y) = \frac{1}{n} \sum_{i = 0}^n \left( x_i - y_i \right)^2 \]
for input tensor \(x\) and target tensor \(y\) each of which contain \(n\) elements.

Public Functions

std::string prettyString() const

Generates a stringified representation of the module.

Return

a string containing the module label

Initialization

Copyright (c) Facebook, Inc.

and its affiliates. All rights reserved.

This source code is licensed under the BSD-style license found in the LICENSE file in the root directory of this source tree.

Functions for initializing tensors.

Provides facilities for creating a fl::Variable tensor of different types and initializations vis-a-vis probability distributions, constants, and the identity. Additionally wraps common tensors as integrated into modules.

namespace fl

Copyright (c) Facebook, Inc.

and its affiliates. All rights reserved.

This source code is licensed under the BSD-style license found in the LICENSE file in the root directory of this source tree.

and its affiliates. All rights reserved.

This source code is licensed under the BSD-style license found in the LICENSE file in the root directory of this source tree. Logging is a light, multi-level, compile time filterable, logging infrastructure that is similar to glog in output format. It defines two logging macros, one for any logging and the other for more verbose logging. Compile time filter is applied separately to each of the two.

Output format: LMMDD HH:MM:SS.uuuuuu tid filename:##] Log message … L: Log level (Fatal, Critical, Error, Warning, Info) MMDD: month, day HH:MM:SS.uuuuuu: time (24-hour format) with micro-seconds tid: thread ID filename:## the basename of the source file and line number of the LOG message

LOG use examples: LOG(INFO) << “foo bar n=” << 42; Output example: I0206 10:42:21.047293 87072 Logging.h:15 foo bar n=42 Note that LOG(level) only prints when level is <= from value set to Logging::setMaxLoggingLevel(level)

VLOG use example: VLOG(1) << “foo bar n=” << 42; Output example: vlog(1)0206 10:42:21.005439 87072 Logging.h:23 foo bar n=42 Note that VLOG(level) only prints when level is <= from value set to VerboseLogging::setMaxLoggingLevel(level)

and its affiliates. All rights reserved.

This source code is licensed under the BSD-style license found in the LICENSE file in the root directory of this source tree. The configurable memory allocator is obtained by calling: std::unique_ptr<MemoryAllocator> CreateMemoryAllocator(config) Config defines a a set of allocators assembled in a CompositeMemoryAllocator.

Functions

Variable input(const af::array &arr)

Constructs a Variable with gradient calculation disabled, from a given array.

Return

a Variable from the given array with gradient calculation disabled

Parameters
  • arr: an af::array to be used

Variable noGrad(const af::array &arr)

See fl::input above.

Return

a Variable from the given array with gradient calculation disabled

Parameters
  • arr: an af::array to be used

Variable param(const af::array &arr)

Constructs a Variable with gradient calculation enabled, from a given array.

Return

a Variable from the given array with gradient calculation enabled

Parameters
  • arr: an af::array to be used

Variable kaimingUniform(int input_size, int output_size, af::dtype type = f32, bool calc_grad = true)

Creates a Variable representing a tensor with dimensions [input_size, output_size] where elements are uniformly distributed according to the method outlined in He et al (2015): Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.

Return

A Variable containing a tensor with random values distributed accordingly.

Parameters
  • input_size: the second dimension for the output tensor shape

  • output_size: the first dimension of the output tensor shape

  • type: the ArrayFire datatype for which to create the tensor

  • calc_grad: flag denoting whether gradient calculation on the resulting Variable should be enabled

Variable kaimingUniform(af::dim4 dims, af::dtype type = f32, bool calc_grad = true)

Creates a Variable representing a tensor of up to rank 4 with arbitrary dimensions, where elements are uniformly distributed according to the method outlined in He et al (2015): Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.

Return

A Variable containing a tensor with random values distributed accordingly.

Parameters
  • dims: an ArrayFire tensor shape

  • type: the ArrayFire datatype for which to create the tensor

  • calc_grad: flag denoting whether gradient calculation on the resulting Variable should be enabled

Variable kaimingNormal(int input_size, int output_size, af::dtype type = f32, bool calc_grad = true)

Creates a Variable representing a tensor with dimensions [input_size, output_size] where elements are normally distributed according to the method outlined in He et al (2015): Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.

Return

A Variable containing a tensor with random values distributed accordingly.

Parameters
  • input_size: the second dimension for the output tensor shape

  • output_size: the first dimension of the output tensor shape

  • type: the ArrayFire datatype for which to create the tensor

  • calc_grad: flag denoting whether gradient calculation on the resulting Variable should be enabled

Variable kaimingNormal(af::dim4 dims, af::dtype type = f32, bool calc_grad = true)

Creates a Variable representing a tensor of up to rank 4 with arbitrary dimensions, where elements are normally distributed according to the method outlined in He et al (2015): Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.

Return

A Variable containing a tensor with random values distributed accordingly.

Parameters
  • dims: an ArrayFire tensor shape

  • type: the ArrayFire datatype for which to create the tensor

  • calc_grad: flag denoting whether gradient calculation on the resulting Variable should be enabled

Variable glorotUniform(int input_size, int output_size, af::dtype type = f32, bool calc_grad = true)

Creates a Variable representing a tensor with dimensions [input_size, output_size] where elements are uniformly distributed according to the method outlined in Glorot and Bengio (2010): Understanding the difficulty of training deep feedforward neural networks.

Return

A Variable containing a tensor with random values distributed accordingly.

Parameters
  • input_size: the second dimension for the output tensor shape

  • output_size: the first dimension of the output tensor shape

  • type: the ArrayFire datatype for which to create the tensor

  • calc_grad: flag denoting whether gradient calculation on the resulting Variable should be enabled

Variable glorotUniform(af::dim4 dims, af::dtype type = f32, bool calc_grad = true)

Creates a Variable representing a tensor of up to rank 4 with arbitrary dimensions, where elements are uniformly distributed according to the method outlined in Glorot and Bengio (2010): Understanding the difficulty of training deep feedforward neural networks.

Return

A Variable containing a tensor with random values distributed accordingly.

Parameters
  • dims: an ArrayFire tensor shape

  • type: the ArrayFire datatype for which to create the tensor

  • calc_grad: flag denoting whether gradient calculation on the resulting Variable should be enabled

Variable glorotNormal(int input_size, int output_size, af::dtype type = f32, bool calc_grad = true)

Creates a Variable representing a tensor with dimensions [input_size, output_size] where elements are normally distributed according to the method outlined in Glorot and Bengio (2010): Understanding the difficulty of training deep feedforward neural networks.

Return

A Variable containing a tensor with random values distributed accordingly.

Parameters
  • input_size: the second dimension for the output tensor shape

  • output_size: the first dimension of the output tensor shape

  • type: the ArrayFire datatype for which to create the tensor

  • calc_grad: flag denoting whether gradient calculation on the resulting Variable should be enabled

Variable glorotNormal(af::dim4 dims, af::dtype type = f32, bool calc_grad = true)

Creates a Variable representing a tensor of up to rank 4 with arbitrary dimensions, where elements are normally distributed according to the method outlined in Glorot and Bengio (2010): Understanding the difficulty of training deep feedforward neural networks.

Return

A Variable containing a tensor with random values distributed accordingly.

Parameters
  • dims: an ArrayFire tensor shape

  • type: the ArrayFire datatype for which to create the tensor

  • calc_grad: flag denoting whether gradient calculation on the resulting Variable should be enabled

Utils

Copyright (c) Facebook, Inc.

and its affiliates. All rights reserved.

This source code is licensed under the BSD-style license found in the LICENSE file in the root directory of this source tree.

Utils for modules.

namespace fl

Copyright (c) Facebook, Inc.

and its affiliates. All rights reserved.

This source code is licensed under the BSD-style license found in the LICENSE file in the root directory of this source tree.

and its affiliates. All rights reserved.

This source code is licensed under the BSD-style license found in the LICENSE file in the root directory of this source tree. Logging is a light, multi-level, compile time filterable, logging infrastructure that is similar to glog in output format. It defines two logging macros, one for any logging and the other for more verbose logging. Compile time filter is applied separately to each of the two.

Output format: LMMDD HH:MM:SS.uuuuuu tid filename:##] Log message … L: Log level (Fatal, Critical, Error, Warning, Info) MMDD: month, day HH:MM:SS.uuuuuu: time (24-hour format) with micro-seconds tid: thread ID filename:## the basename of the source file and line number of the LOG message

LOG use examples: LOG(INFO) << “foo bar n=” << 42; Output example: I0206 10:42:21.047293 87072 Logging.h:15 foo bar n=42 Note that LOG(level) only prints when level is <= from value set to Logging::setMaxLoggingLevel(level)

VLOG use example: VLOG(1) << “foo bar n=” << 42; Output example: vlog(1)0206 10:42:21.005439 87072 Logging.h:23 foo bar n=42 Note that VLOG(level) only prints when level is <= from value set to VerboseLogging::setMaxLoggingLevel(level)

and its affiliates. All rights reserved.

This source code is licensed under the BSD-style license found in the LICENSE file in the root directory of this source tree. The configurable memory allocator is obtained by calling: std::unique_ptr<MemoryAllocator> CreateMemoryAllocator(config) Config defines a a set of allocators assembled in a CompositeMemoryAllocator.

Functions

bool allParamsClose(const Module &a, const Module &b, double absTolerance = 1e-5)

Returns true if the parameters of two modules are of same type and are element-wise equal within given tolerance limit.

Parameters
  • [ab]: input Modules to compare

  • absTolerance: absolute tolerance allowed

int derivePadding(int inSz, int filterSz, int stride, int pad, int dilation)
af::array join(const std::vector<af::array> &inputs, double padValue = 0.0, dim_t batchDim = -1)

packs a list of arrays (possibly of different dimensions) to a single array by padding them to same dimensions

DistributedUtils

namespace fl

Copyright (c) Facebook, Inc.

and its affiliates. All rights reserved.

This source code is licensed under the BSD-style license found in the LICENSE file in the root directory of this source tree.

and its affiliates. All rights reserved.

This source code is licensed under the BSD-style license found in the LICENSE file in the root directory of this source tree. Logging is a light, multi-level, compile time filterable, logging infrastructure that is similar to glog in output format. It defines two logging macros, one for any logging and the other for more verbose logging. Compile time filter is applied separately to each of the two.

Output format: LMMDD HH:MM:SS.uuuuuu tid filename:##] Log message … L: Log level (Fatal, Critical, Error, Warning, Info) MMDD: month, day HH:MM:SS.uuuuuu: time (24-hour format) with micro-seconds tid: thread ID filename:## the basename of the source file and line number of the LOG message

LOG use examples: LOG(INFO) << “foo bar n=” << 42; Output example: I0206 10:42:21.047293 87072 Logging.h:15 foo bar n=42 Note that LOG(level) only prints when level is <= from value set to Logging::setMaxLoggingLevel(level)

VLOG use example: VLOG(1) << “foo bar n=” << 42; Output example: vlog(1)0206 10:42:21.005439 87072 Logging.h:23 foo bar n=42 Note that VLOG(level) only prints when level is <= from value set to VerboseLogging::setMaxLoggingLevel(level)

and its affiliates. All rights reserved.

This source code is licensed under the BSD-style license found in the LICENSE file in the root directory of this source tree. The configurable memory allocator is obtained by calling: std::unique_ptr<MemoryAllocator> CreateMemoryAllocator(config) Config defines a a set of allocators assembled in a CompositeMemoryAllocator.

Functions

void distributeModuleGrads(std::shared_ptr<const Module> module, std::shared_ptr<Reducer> reducer = std::make_shared<InlineReducer>(1.0 / getWorldSize()))

Registers a module for allreduce synchronization with a gradient hook on it parameter Variables.

Parameters
  • [in] module: a module whose parameter gradients will be synchronized

  • [in] a: Reducer instance to which gradients will be immediately added when available

void allReduceParameters(std::shared_ptr<const Module> module)

Traverses the network and averages its parameters with allreduce.

Parameters
  • module: a module whose parameters will be synchronized

void allReduceGradients(std::shared_ptr<const Module> module, double scale = 1.0)

Traverses the network and synchronizes the gradients of its parameters with allreduce.

Parameters
  • module: a module whose parameter gradients will be synchronized

  • scale: scale gradients after allreduce by this factor