Modules¶
Containers¶
Module¶
-
class
Module
¶ An abstract computation unit capable of forward computation.
Also contains a collection of parameters that can be mutated, and will be serialized and deserialized with the module.
Subclassed by fl::BinaryModule, fl::Container, fl::Identity, fl::PrecisionCast, fl::RNN, fl::UnaryModule, fl::WeightNorm
Public Functions
-
std::vector<Variable>
params
() const¶ Gets the parameters of the module.
- Return
the modules parameters as a vector of
Variable
-
virtual void
train
()¶ Switches the module to training mode.
Changes all parameters so that gradient calculation will be enabled for any calls to
forward
.
-
virtual void
eval
()¶ Switches the module to evaluation mode.
Changes all parameters so that gradient calculation will be disabled for any calls to
forward
.
-
Variable
param
(int position) const¶ Returns a module parameter given a particular position.
- Return
a
Variable
tensor for the parameter at the requested position- Parameters
position
: the index of the requested parameter inparams_
-
virtual void
setParams
(const Variable &var, int position)¶ Sets a parameter at a specified position with a new, given one.
If the specified position is not valid (it is negative or greater than
params_.size() - 1
), then an error will be thrown. A new parameter will not be created at a specified index if out of bounds.- Parameters
var
: the new replacementVariable
position
: The index of the parameter which will be replaced inparams_
-
void
zeroGrad
()¶ Clears references to gradient Variables for all parameters in the module.
-
virtual std::vector<Variable>
forward
(const std::vector<Variable> &inputs) = 0¶ Performs forward computation for the module, given some inputs.
- Return
a vector of
Variable
tensors containing the result of the forward computation- Parameters
inputs
: the values to compute forward computation for the module.
-
std::vector<Variable>
operator()
(const std::vector<Variable> &inputs)¶ Overload for forward computation for the module.
- Return
a vector of
Variable
tensors containing the result of the forward computation- Parameters
inputs
: the values to compute forward computation for the module.
-
virtual std::string
prettyString
() const = 0¶ Generates a stringified representation of the module.
- Return
a string containing the module label
-
virtual
~Module
()¶
-
std::vector<Variable>
-
class
UnaryModule
: public fl::Module¶ An extension of
Module
which supports only forward computation on a singleVariable
with a singleVariable
as output.For example,
Sigmoid
module can be derived fromUnaryModule
.Subclassed by fl::AdaptiveEmbedding, fl::AdaptiveSoftMax, fl::BatchNorm, fl::Conv2D, fl::Dropout, fl::ELU, fl::Embedding, fl::GatedLinearUnit, fl::HardTanh, fl::LayerNorm, fl::LeakyReLU, fl::Linear, fl::Log, fl::LogSoftmax, fl::Normalize, fl::Padding, fl::Pool2D, fl::PReLU, fl::RawWavSpecAugment, fl::ReLU, fl::ReLU6, fl::Reorder, fl::Sigmoid, fl::SpecAugment, fl::Swish, fl::Tanh, fl::ThresholdReLU, fl::Transform, fl::View
Public Functions
-
UnaryModule
()¶
-
std::vector<Variable>
forward
(const std::vector<Variable> &inputs)¶ Performs forward computation for the module, given some inputs.
- Return
a vector of
Variable
tensors containing the result of the forward computation- Parameters
inputs
: the values to compute forward computation for the module.
-
virtual
~UnaryModule
()¶
-
-
class
BinaryModule
: public fl::Module¶ An extension of
Module
which supports only forward computation on a pair ofVariable
s with a singleVariable
as output.For example,
BinaryCrossEntropy
Loss can be derived fromBinaryModule
.Subclassed by fl::AdaptiveSoftMaxLoss, fl::BinaryCrossEntropy, fl::CategoricalCrossEntropy, fl::MeanAbsoluteError, fl::MeanSquaredError, fl::pkg::speech::ForceAlignmentCriterion, fl::pkg::speech::FullConnectionCriterion
Public Functions
-
BinaryModule
()¶
-
std::vector<Variable>
forward
(const std::vector<Variable> &inputs)¶ Performs forward computation for the module, given some inputs.
- Return
a vector of
Variable
tensors containing the result of the forward computation- Parameters
inputs
: the values to compute forward computation for the module.
-
virtual
~BinaryModule
()¶
-
Container¶
-
class
Container
: public fl::Module¶ A computation unit capable of forward computation that contains a collection of multiple
Module
and their respective parameters.Subclassed by fl::app::benchmark::AsrTransformer, fl::app::benchmark::LmTransformer, fl::Conformer, fl::pkg::speech::AttentionBase, fl::pkg::speech::SequenceCriterion, fl::pkg::vision::Detr, fl::pkg::vision::MultiheadAttention, fl::pkg::vision::PositionalEmbeddingSine, fl::pkg::vision::Resnet50Backbone, fl::pkg::vision::ResNetBlock, fl::pkg::vision::ResNetBlockFrozenBatchNorm, fl::pkg::vision::ResNetBottleneckBlock, fl::pkg::vision::ResNetBottleneckBlockFrozenBatchNorm, fl::pkg::vision::Transformer, fl::pkg::vision::TransformerBaseLayer, fl::pkg::vision::TransformerDecoder, fl::pkg::vision::TransformerDecoderLayer, fl::pkg::vision::TransformerEncoder, fl::pkg::vision::VisionTransformer, fl::pkg::vision::ViT, fl::PositionEmbedding, fl::Residual, fl::Sequential, fl::SinusoidalPositionEmbedding, fl::TDSBlock, fl::Transformer
Public Functions
-
template<typename
T
>
voidadd
(const T &module)¶ Adds a module to a
Container
by making a copy of the underlying module.Note that parameters are still shared, due to Variable’s copy semantics.
- Parameters
module
: the module to add.
Adds a module to
modules_
, and adds parameters to the container’sparams_
.- Parameters
module
: the module to add.
-
ModulePtr
module
(int id) const¶ Returns a pointer to the module at the specified index in the container’s
modules_
.- Return
a pointer to the requested module
- Parameters
id
: the index of the module to return
-
std::vector<ModulePtr>
modules
() const¶ Returns pointers to each of
Module
in theContainer
.- Return
an ordered vector of pointers for each module.
-
void
setParams
(const Variable &var, int position)¶ Sets a parameter at a specified position with a new, given one.
If the specified position is not valid (it is negative or greater than
params_.size() - 1
), then an error will be thrown. A new parameter will not be created at a specified index if out of bounds.- Parameters
var
: the new replacementVariable
position
: The index of the parameter which will be replaced inparams_
-
template<typename
Sequential¶
-
class
Sequential
: public fl::Container¶ A
Container
representing an ordered sequence of modules, which is capable of forward computation through each of its modules, in order.Usage:
Sequential mySequential(); // Assume we've defined implemented three modules, mod1, mod2, mod3 mySequential.add(mod1); mySequential.add(mod2); mySequential.add(mod3); // Performing forward computation will forward through each `Module` in order auto result = mySequential.forward(myInput); // We can also inspect internal state assert(mySequential.modules().size() == 3); // true assert( mod1.params().size() + mod2.params.size() + mod3.params().size() == mySequential.params().size()); // true
Subclassed by fl::pkg::vision::ConvBnAct, fl::pkg::vision::ConvFrozenBatchNormActivation, fl::pkg::vision::MLP, fl::pkg::vision::ResNetBottleneckStage, fl::pkg::vision::ResNetBottleneckStageFrozenBatchNorm, fl::pkg::vision::ResNetStage, fl::pkg::vision::ResNetStageFrozenBatchNorm
Public Functions
-
Sequential
()¶
-
std::vector<Variable>
forward
(const std::vector<Variable> &input)¶ Performs forward computation for the
Sequential
, callingforward
, in order, for eachModule
, and feeding the result as input to the nextModule
.
-
std::string
prettyString
() const¶ Generates a stringified representation of the
Sequential
by concatenating string representations for each containedModule
- Return
a string containing the module label
-
Layers¶
Activations¶
-
class
Sigmoid
: public fl::UnaryModule¶ Applies the sigmoid function element-wise to a
Variable
:\[\text{sigmoid}(x) = \frac{1}{1 + e^{-x}}\].
-
class
Tanh
: public fl::UnaryModule¶ Applies the hyperbolic tangent function element-wise to a
Variable
:\[\text{tanh}(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\].
-
class
HardTanh
: public fl::UnaryModule¶ Applies the hard-tanh function element-wise to a
Variable
:\[\begin{split}\text{HardTanh}(x) = \begin{cases} 1 & \text{ if } x > 1 \\ -1 & \text{ if } x < -1 \\ x & \text{ otherwise } \\ \end{cases} \end{split}\].
-
class
ReLU
: public fl::UnaryModule¶ Applies the rectified linear unit function element-wise to a
Variable
:\[ ReLU(x) = \max(0, x) \].
-
class
LeakyReLU
: public fl::UnaryModule¶ Applies the leaky rectified linear unit function from Maas et al (2013), Rectifier Nonlinearities Improve Neural Network Acoustic Models.
Applied function element-wise to a
Variable
:\[\begin{split} \text{LeakyRELU}(x) = \begin{cases} x, & \text{ if } x \geq 0 \\ \text{slope} \times x, & \text{ otherwise } \end{cases} \end{split}\]where \(\text{slope}\) is a constant by which the input will be multiplied if less than zero.
-
class
PReLU
: public fl::UnaryModule¶ Applies the pramaeterized rectified linear unit function from He et al (2015), Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.
Applied element-wise to a
Variable
, given some input size:\[\begin{split} \text{PReLU}(x) = \begin{cases} x, & \text{ if } x \geq 0 \\ \text{value} \times x, & \text{ otherwise } \end{cases} \end{split}\]where \(\text{value}\) is a learned parameter whose initialization can be tuned.
-
class
ELU
: public fl::UnaryModule¶ Applies the exponential linear unit function from Clevert et al (2015): Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs).
Applied element-wise to a
Variable
:\[\begin{split} \text{ELU}(x) = \begin{cases} x & \text{ if } x \geq 0 \\ \alpha \times (e^x - 1) & \text{ otherwise } \end{cases} \end{split}\]where \(\alpha\) is a tunable parameter.
-
class
ThresholdReLU
: public fl::UnaryModule¶ Applies the threshold rectified linear unit from Konda et al (2015): Zero-bias autoencoders and the benefits of co-adapting features.
Applied element-wise to a
Variable
:\[\begin{split} \text{ThresholdReLU}(x) = \begin{cases} x & \text{ if } x > \text{threshold} \\ 0 & \text{ otherwise } \end{cases} \end{split}\]where \(\text{threshold}\) is a tunable parameter.
-
class
GatedLinearUnit
: public fl::UnaryModule¶ Creates a Gated Linear Unit from Dauphin et al (2017): Language Modeling with Gated Convolutional Networks.
\[\text{GLU}(x) = x_i \otimes \sigma(x_j)\]where \(\otimes\) denotes the element-wise product \(x_i\) is the first half of the input, \(x_j\) is the second half, and \(\sigma(x)\) is the sigmoid function.
-
class
LogSoftmax
: public fl::UnaryModule¶ Applies the log softmax function to a tensor:
\[ \text{LogSoftmax}(x_i) = \log{\left (\frac{e^{x_i} }{ \sum_j e^{x_j}} \right)} \].
-
class
Log
: public fl::UnaryModule¶ Applies the natural logarithm element-wise to a
Variable
.
BatchNorm¶
-
class
BatchNorm
: public fl::UnaryModule¶ Applies Batch Normalization on a given input as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.
The operation implemented is:
\[out(x) = \frac{x - E[x]}{\sqrt{Var[x]+\epsilon}} \times \gamma + \beta \]where \(E[x]\) and \(Var[x]\) are the mean and variance of the input \(x\) calculated over the specified axis, \(\epsilon\) is a small value added to the variance to avoid divide-by-zero, and \(\gamma\) and \(\beta\) are learnable parameters for affine transformation.Subclassed by fl::FrozenBatchNorm
Public Functions
-
BatchNorm
(int featAxis, int featSize, double momentum = 0.1, double eps = 1e-5, bool affine = true, bool trackStats = true)¶ Constructs a BatchNorm module.
- Parameters
featAxis
: the axis over which normalizationis performedfeatSize
: the size of the dimension alongfeatAxis
momentum
: an exponential average factor used to compute running mean and variance.\[ runningMean = runningMean \times (1-momentum) + newMean \times momentum \]If < 0, cumulative moving average is used.eps
: \(\epsilon\)affine
: a boolean value that controls the learning of \(\gamma\) and \(\beta\). \(\gamma\) and \(\beta\) are set to 1, 0 respectively if set tofalse
, or initialized as learnable parameters if set totrue
.trackStats
: a boolean value that controls whether to track the running mean and variance while in train mode. Iffalse
, batch statistics are used to perform normalization in both train and eval mode.
-
BatchNorm
(const std::vector<int> &featAxis, int featSize, double momentum = 0.1, double eps = 1e-5, bool affine = true, bool trackStats = true)¶ Constructs a BatchNorm module.
- Parameters
featAxis
: the axis over which normalization is performedfeatSize
: total dimension alongfeatAxis
. For example, to perform Temporal Batch Normalization on input of size [ \(L\), \(C\), \(N\)], usefeatAxis
= {1},featSize
= \(C\). To perform normalization per activation on input of size [ \(W\), \(H\), \(C\), \(N\)], usefeatAxis
= {0, 1, 2},featSize
= \(W \times H \times C\).momentum
: an exponential average factor used to compute running mean and variance.\[ runningMean = runningMean \times (1-momentum) + newMean \times momentum \]If < 0, cumulative moving average is used.eps
: \(\epsilon\)affine
: a boolean value that controls the learning of \(\gamma\) and \(\beta\). \(\gamma\) and \(\beta\) are set to 1, 0 respectively if set tofalse
, or initialized as learnable parameters if set totrue
.trackStats
: a boolean value that controls whether to track the running mean and variance while in train mode. Iffalse
, batch statistics are used to perform normalization in both train and eval mode.
-
Conv2D¶
-
class
Conv2D
: public fl::UnaryModule¶ Applies a 2D convolution over an 4D input along its first two dimensions.
This layer expects an input of shape [ \(X_{in}\), \(Y_{in}\), \(C_{in}\), \(N\)] where \(C_{in}\) is the number of input channels, and generates an output of shape [ \(X_{out}\), \(Y_{out}\), \(C_{out}\), \(N\)] where \(C_{out}\) is the number of output channels,
\[X_{out} = \frac{X_{in} + 2 \times X_{pad} - (1 + (X_{filter} - 1) \times X_{dilation})}{X_{stride}} + 1\]\[Y_{out} = \frac{Y_{in} + 2 \times Y_{pad} - (1 + (Y_{filter} - 1) \times Y_{dilation})}{Y_{stride}} + 1\]Two modes for zero-padding are supported:
AF_PADDING_NONE: no padding
AF_PADDING_SAME: \(X_{pad}\) and \(Y_{pad}\) are dynamically chosen so that
\[X_{out} = \lceil{\frac{X_{in}}{X_{stride}}}\rceil, Y_{out} = \lceil{\frac{Y_{in}}{Y_{stride}}}\rceil\]
Subclassed by fl::AsymmetricConv1D
Public Functions
-
Conv2D
(int n_in, int n_out, int wx, int wy, int sx = 1, int sy = 1, detail::IntOrPadMode px = 0, detail::IntOrPadMode py = 0, int dx = 1, int dy = 1, bool bias = true, int groups = 1)¶ Constructs a Conv2D module.
- Parameters
n_in
: \(C_{in}\), the number of channels in the inputn_out
: \(C_{out}\), the number of channels in the outputwx
: the size of the first dimension of the convolving kernelwy
: the size of the second dimension of the convolving kernelsx
: the stride of the convolution along the first dimensionsy
: the stride of the convolution along the second dimensionpx
: the amount of zero-padding added to the both sides of the first dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingModepy
: the amount of zero-padding added to the both sides of the second dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingModedx
: dilation of the convolution along the first kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.dy
: dilation of the convolution along the second kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.bias
: a boolean value that controls whether to add a learnable bias to the outputgroups
: the number of groups that the input and output channels are divided into for restricting the connectivity between input and output channels. Ifgroups
> 1, the the output channels in the i-th group will be only connected to the input channels in the i-th group
-
Conv2D
(const Variable &w, int sx = 1, int sy = 1, detail::IntOrPadMode px = 0, detail::IntOrPadMode py = 0, int dx = 1, int dy = 1, int groups = 1)¶ Constructs a Conv2D module with a kernel
Variable
tensor.No bias term will be applied to the output.
- Parameters
w
: the kernelVariable
tensor. The shape should be [ \(kerneldim_0\), \(kerneldim_1\), \(C_{in}\), \(C_{out}\)].sx
: the stride of the convolution along the first dimensionsy
: the stride of the convolution along the second dimensionpx
: the amount of zero-padding added to the both sides of the first dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingModepy
: the amount of zero-padding added to the both sides of the second dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingModedx
: dilation of the convolution along the first kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.dy
: dilation of the convolution along the second kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.groups
: the number of groups that the input and output channels are divided into for restricting the connectivity between input and output channels. Ifgroups
> 1, the the output channels in the i-th group will be only connected to the input channels in the i-th group.
-
Conv2D
(const Variable &w, const Variable &b, int sx = 1, int sy = 1, detail::IntOrPadMode px = 0, detail::IntOrPadMode py = 0, int dx = 1, int dy = 1, int groups = 1)¶ Constructs a Conv2D module with a kernel
Variable
tensor and a biasVariable
tensor.- Parameters
w
: the kernelVariable
tensor. The shape should be [ \(kerneldim_0\), \(kerneldim_1\), \(C_{in}\), \(C_{out}\)].b
: the biasVariable
tensor. The shape should be [ \(1\), \(1\), \(C_{out}\), \(1\)].sx
: the stride of the convolution along the first dimensionsy
: the stride of the convolution along the second dimensionpx
: the amount of zero-padding added to the both sides of the first dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingModepy
: the amount of zero-padding added to the both sides of the second dimension of the input. Accepts a non-negative integer value or an enum fl::PaddingModedx
: dilation of the convolution along the first kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.dy
: dilation of the convolution along the second kernel dimension. A dilation of 1 is equivalent to a standard convolution along this axis.groups
: the number of groups that the input and output channels are divided into for restricting the connectivity between input and output channels. Ifgroups
> 1, the the output channels in the i-th group will be only connected to the input channels in the i-th group.
Dropout¶
-
class
Dropout
: public fl::UnaryModule¶ Implements Dropout normalization, as given by Hinton et al (2012): Improving neural networks by preventing co-adaptation of feature detectors.
Effectively regularizes by randomly zeroing out values in the input based on a given ratio.
All values that are not zeroed out are scaled by a factor of \(\frac{1}{1 - p}\). Thus, with the same network, at test time, evaluating the module gives the identity.
Embedding¶
-
class
Embedding
: public fl::UnaryModule¶ Looks up embeddings from a learnable dictionary of fixed size.
This layer expects as input a list of indices with at most three dimensions, [ \(B_1\), \(B_2\) (optional), \(B_3\) (optional)], and generates an output from lookup of shape [
embeddingDim
, \(B_1\), \(B_2\) (optional), \(B_3\) (optional)].Public Functions
-
Embedding
(int embeddingDim, int numEmbeddings)¶ Constructs an Embedding module.
- Parameters
embeddingDim
: the size of each embedding vectornumEmbeddings
: the size of the dictionary of embeddings
-
LayerNorm¶
-
class
LayerNorm
: public fl::UnaryModule¶ Applies Layer Normalization on a given input as described in the paper Layer Normalization.
The operation implemented is:
\[out(x) = \frac{x - E[x]}{\sqrt{Var[x]+\epsilon}} \times \gamma + \beta \]where \(E[x]\) and \(Var[x]\) are the mean and variance of the input \(x\) calculated along specified axis, \(\epsilon\) is a small value added to the variance to avoid divide-by-zero, and \(\gamma\) and \(\beta\) are learnable parameters for affine transformation.Public Functions
-
LayerNorm
(int axis, double eps = 1e-5, bool affine = true, int axisSize = kLnVariableAxisSize)¶ Constructs a LayerNorm module.
- Parameters
axis
: the axis along which normalization is computed. Usually set as the feature axis.eps
: \(\epsilon\)affine
: a boolean value that controls the learning of \(\gamma\) and \(\beta\). \(\gamma\) and \(\beta\) are set to 1, 0 respectively if set tofalse
, or initialized as learnable parameters if set totrue
.axisSize
: total size of features specified byaxis
to perform elementwise affine transform. If the feat size is variable, usekLnVariableAxisSize
which uses singleton weight, bias and tiles them dynamically according to the given input.
-
LayerNorm
(const std::vector<int> &axis, double eps = 1e-5, bool affine = true, int axisSize = kLnVariableAxisSize)¶ Constructs a LayerNorm module.
- Parameters
axis
: the axis along which normalization is computed. Usually set as the feature axis.eps
: \(\epsilon\)affine
: a boolean value that controls the learning of \(\gamma\) and \(\beta\). \(\gamma\) and \(\beta\) are set to 1, 0 respectively if set tofalse
, or initialized as learnable parameters if set totrue
.axisSize
: total size of features specified byaxis
to perform elementwise affine transform. If the feat size is variable, usekLnVariableAxisSize
which uses singleton weight, bias and tiles them dynamically according to the given input.
-
Linear¶
-
class
Linear
: public fl::UnaryModule¶ Applies linear transformation on input: \(y = Wx + b \).
This layer takes in an input of shape [
input_size
, *, *, *] and transforms it to an output of shape [output_size
, *, *, *].Public Functions
-
Linear
(int input_size, int output_size, bool bias = true)¶ Constructs a Linear module from the input and output sample sizes.
- Parameters
input_size
: the size of each input sampleoutput_size
: the size of each output samplebias
: a boolean value that controls whether the layer will include a bias term \(b\).
-
Linear
(const Variable &w)¶ Constructs a Linear module from the weight parameter \(w\).
The layer will not include the bias term \(b\) in this case.
- Parameters
w
: the 2DVariable
tensor for the weight \(w\). The shape should be [output_size
,input_size
].
-
Padding¶
-
class
Padding
: public fl::UnaryModule¶ Adds a padding of value
val
before and after each dimension \(i\) of size specified by the tuplepadi
to the input.
Pool2D¶
-
class
Pool2D
: public fl::UnaryModule¶ A 2D pooling layer.
This layer expects an input of shape [ \(X_{in}\), \(Y_{in}\), \(C\), \(N\)]. Pooling (max or average) is performed over the first and second dimensions of the input. Thus the output will be of shape [ \(X_{out}\), \(Y_{out}\), \(C\), \(N\)].
Reorder¶
-
class
Reorder
: public fl::UnaryModule¶ Reorders the data according to the specified dimensions.
The order of the data may change and is guaranteed to be contiguous in memory.
// A layer which transposes a matrix auto transposeLayer = Reorder(1, 0); auto var = Variable(Tensor({1, 2, 3, 4}), false); // Make the last dimension the first dimension var = Reorder(3, 0, 1, 2)(var); // Dims will be {4, 1, 2, 3} std::cout << var.shape() << std::endl;
RNN¶
-
class
RNN
: public fl::Module¶ A recurrent neural network (RNN) layer.
The RNN layer supports several cell types. The most basic RNN (e.g. an Elman network) computes the following function:
\[ h_t = \sigma(W x_t + U h_{t-1} + b) \]If the RNN mode is RELU then \(\sigma\) will be aReLU
. If the RNN mode is TANH then it will be aTanh
function.Gated Recurrent Units (GRU) are supported. For details see the original GRU paper or the Wikipedia page.
LSTM cells are also supported (LSTM). The LSTM cell uses a forget gate and does not have peephole connections. For details see the original paper Long Short-Term Memory or the Wikipedia page.
The input to the RNN is expected to be of shape [ \(X_{in}\), \(N\), \(T\)] where \(N\) is the batch size and \(T\) is the sequence length.
The output of the RNN is will be of shape [ \(X_{out}\), \(N\), \(T\)]. Here \(X_{out}\) will be
hidden_size
if the RNN is unidirectional and it will be twice thehidden_size
if the RNN is bidirectional.In addition the RNN supports including the hidden state and the cell state as input and output. When these are input as the empty Variable they are assumed to be zero.
Public Functions
-
RNN
(int input_size, int hidden_size, int num_layers, RnnMode mode, bool bidirectional = false, float drop_prob = 0.0)¶ Construct an RNN layer.
- Parameters
input_size
: The dimension of the input (e.g. \(X_{in}\))hidden_size
: The hidden dimension of the RNN.num_layers
: The number of recurrent layers.mode
: The RNN mode to use. Can be any of:RELU
TANH
LSTM
GRU
bidirectional
: Whether or not the RNN is bidirectional. Iftrue
the output dimension will be doubled.drop_prob
: The probability of dropout after each RNN layer except the last layer.
-
std::vector<Variable>
forward
(const std::vector<Variable> &inputs)¶ Performs forward computation for the module, given some inputs.
- Return
a vector of
Variable
tensors containing the result of the forward computation- Parameters
inputs
: the values to compute forward computation for the module.
-
Variable
forward
(const Variable &input)¶ Forward the RNN Layer.
- Return
a single output Variable with shape [ \(X_{out}\), \(N\), \(T\)]
- Parameters
input
: Should be of shape [ \(X_{in}\), \(N\), \(T\)]
-
std::tuple<Variable, Variable>
forward
(const Variable &input, const Variable &hidden_state)¶ Forward the RNN Layer.
- Return
An tuple of output Variables.
- Parameters
input
: Should be of shape [ \(X_{in}\), \(N\), \(T\)]hidden_state
: Should be of shape [ \(X_{out}\), \(N\)]. If an empty Variable is passed in then the hidden state is assumed zero.
-
std::tuple<Variable, Variable, Variable>
forward
(const Variable &input, const Variable &hidden_state, const Variable &cell_state)¶ Forward the RNN Layer.
- Return
An tuple of output Variables.
- Parameters
input
: Should be of shape [ \(X_{in}\), \(N\), \(T\)]hidden_state
: Should be of shape [ \(X_{out}\), \(N\)]. If an empty Variable is passed in then the hidden state is assumed zero.cell_state
: Should be of shape [ \(X_{out}\), \(N\)]. If an empty Variable is passed in then the hidden state is assumed zero.
-
Transform¶
-
class
Transform
: public fl::UnaryModule¶ Applies a transformation on the input specified by a lambda function.
For example to add a \( 1 + log(x) \) layer to a container:
Note this module cannot be serialized.model.add( Transform([](const Variable& in) { return 1 + afnet::log(in); } );
View¶
-
class
View
: public fl::UnaryModule¶ Modifies the dimensions of a
Variable
and rearranges its elements without modifying the order of elements in the underlyingTensor
.When specifying the number of elements in the array:
If
-1
is specified on a particular axis, that axis will be assigned a dimension based on the number of total elements in the tensor. Only one axis value can be-1
.If
0
is specified on a particular axis, that axis will have the same dimension as does the input tensor. For example: given an input tensor with shape(10, 20, 30, 40)
and aView
with shape(-1, 0, 100)
, the output tensor will have shape(120, 20, 100)
.
WeightNorm¶
-
class
WeightNorm
: public fl::Module¶ A weight normalization layer.
This layer wraps a given module to create a weight normalized implementation of the module. WeightNorm currently supports Linear and Conv2D. For example:
WeightNorm wn(Linear(128, 128), 0);
For more details see Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
Losses¶
AdaptiveSoftMaxLoss¶
-
class
AdaptiveSoftMaxLoss
: public fl::BinaryModule¶ An efficient approximation of the softmax function and negative log-likelihood loss.
Computes the Adaptive Softmax, as given by Grave et al (2017): Efficient softmax approximation for GPUs. Efficient when the number of classes over which the softmax is being computed is very high and the label distribution is highly imbalanced.
Adaptive softmax buckets the inputs based on their frequency, where clusters may be different number of targets each. For each minibatch, only clusters for which at least one target is present are evaluated. Forward pass for low-frequency inputs are approximated with lower rank matrices so as to speed up computation.
Public Functions
Create an
AdaptiveSoftMaxLoss
with given parameters.- Parameters
reduction
: the reduction mode - seeReductionMode
See documentation onReduceMode
for available options.ignoreIndex
: a target value that is ignored and does not contribute to the loss or the input gradient. Ifreduce
is MEAN, the loss is averaged over non-ignored targets.
-
Variable
forward
(const Variable &inputs, const Variable &targets)¶ Computes the categorical cross entropy loss for some input and target tensors (uses adaptive softmax function to do this efficiently)
-
void
setParams
(const Variable &var, int position)¶ Sets a parameter at a specified position with a new, given one.
If the specified position is not valid (it is negative or greater than
params_.size() - 1
), then an error will be thrown. A new parameter will not be created at a specified index if out of bounds.- Parameters
var
: the new replacementVariable
position
: The index of the parameter which will be replaced inparams_
BinaryCrossEntropy¶
-
class
BinaryCrossEntropy
: public fl::BinaryModule¶ Computes the binary cross entropy loss between an input tensor \(x\) and a target tensor \(y\).
The binary cross entropy loss is:
\[ B(x, y) = \frac{1}{n} \sum_{i = 0}^n -\left( w_i \times (y_i \times \log(x_i) + (1 - y_i) \times \log(1 - x_i)) \right) \]where \(w\) is an optional weight parameter for rescaling.Both the inputs and the targets are expected to be between 0 and 1.
Public Functions
-
Variable
forward
(const Variable &inputs, const Variable &targets, const Variable &weights)¶ Perform forward loss computation with an additional weight tensor.
- Parameters
inputs
: a tensor with the predicted valuestargets
: a tensor with the target valuesweights
: a rescaling weight given to the loss of each element.
-
Variable
CategoricalCrossEntropy¶
-
class
CategoricalCrossEntropy
: public fl::BinaryModule¶ Computes the categorical cross entropy loss between an input and a target tensor.
The input is expected to contain log probabilities (which can be accomplished via
LogSoftmax
). The targets should contain the index of the ground truth class for each input example.In the batch case, the output loss tensor \(\{l_1,...,l_N\}^\top\), put \(l_n = -x_{n, y_n}\) (only consider the probability of the correct class). Then reduce via:
\[ \mathcal{L}(x, y) = \sum_{i = 1}^N l_i \]if using a sum reduction,\[ \mathcal{L}(x, y) = \frac{1}{n} \sum_{i = 1}^N l_i \]if using a mean reduction. If using no reduction (‘none’), the result will be reshaped to the target dimensions, giving a loss for each example. SeeReduceMode
.
MeanAbsoluteError¶
-
class
MeanAbsoluteError
: public fl::BinaryModule¶ Computes the mean absolute error (equivalent to the \(L_1\) loss):
\[ \mathcal{L}(x, y) = \frac{1}{n} \sum_{i = 0}^n \left| x_i - y_i \right| \]for input tensor \(x\) and target tensor \(y\) each of which contain \(n\) elements.
MeanSquaredError¶
-
class
MeanSquaredError
: public fl::BinaryModule¶ Computes the mean squared error between elements across two tensors:
\[ \mathcal{L}(x, y) = \frac{1}{n} \sum_{i = 0}^n \left( x_i - y_i \right)^2 \]for input tensor \(x\) and target tensor \(y\) each of which contain \(n\) elements.
Initialization¶
-
group
nn_init_utils
Functions for initializing tensors.
Provides facilities for creating a
fl::Variable
tensor of different types and initializations vis-a-vis probability distributions, constants, and the identity. Additionally wraps common tensors as integrated into modules.Functions
-
Variable
input
(const Tensor &arr)¶ Constructs a
Variable
with gradient calculation disabled, from a given array.
-
Variable
param
(const Tensor &arr)¶ Constructs a
Variable
with gradient calculation enabled, from a given array.
-
Variable
constant
(double val, int inputSize, int outputSize, fl::dtype type = fl::dtype::f32, bool calcGrad = true)¶ Creates a
Variable
representing a tensor with dimensions[inputSize, outputSize]
where all elements are a constant.- Return
A
Variable
containing a tensor with constant values.- Parameters
val
: the value of the constant in the tensorinputSize
: the second dimension for the output tensor shapeoutputSize
: the first dimension of the output tensor shapetype
: the datatype for which to create the tensorcalcGrad
: flag denoting whether gradient calculation on the resultingVariable
should be enabled
-
Variable
constant
(double val, const Shape &shape, fl::dtype type = fl::dtype::f32, bool calcGrad = true)¶ Creates a
Variable
representing a tensor of up to rank 4 with arbitrary dimensions where all elements are a constant.
-
template<typename
T
>
Variablescalar
(T val, fl::dtype type = dtype_traits<T>::ctype, bool calcGrad = true)¶ Creates a
Variable
representing a scalar with a given value and type.
-
Variable
identity
(int inputSize, int outputSize, fl::dtype type = fl::dtype::f32, bool calcGrad = true)¶ Creates a
Variable
representing an identity tensor with dimensions[inputSize, outputSize]
.- Return
A
Variable
containing the identity tensor.- Parameters
inputSize
: the second dimension for the output tensor shapeoutputSize
: the first dimension of the output tensor shapetype
: the datatype for which to create the tensorcalcGrad
: flag denoting whether gradient calculation on the resultingVariable
should be enabled
-
Variable
identity
(const Shape &shape, fl::dtype type = fl::dtype::f32, bool calcGrad = true)¶ Creates a
Variable
representing an identity tensor of up to rank 4 with arbitrary dimensions.
-
Variable
uniform
(int inputSize, int outputSize, double min = 0, double max = 1, fl::dtype type = fl::dtype::f32, bool calcGrad = true)¶ Creates a
Variable
representing a tensor with dimensions[inputSize, outputSize]
, where elements are distributed according to a uniform distribution with parameters \(\mathcal{U}(min, max)\).See Uniform Distribution.
- Return
A
Variable
containing a tensor with random values distributed accordingly.- Parameters
inputSize
: the second dimension for the output tensor shapeoutputSize
: the first dimension of the output tensor shapemin
: the lower bound parameter for the uniform distributionmax
: the upper bound parameter for the uniform distributiontype
: the datatype for which to create the tensorcalcGrad
: flag denoting whether gradient calculation on the resultingVariable
should be enabled
-
Variable
uniform
(const Shape &shape, double min = 0, double max = 1, fl::dtype type = fl::dtype::f32, bool calcGrad = true)¶ Creates a
Variable
representing a tensor of up to rank 4 with arbitrary dimensions, where elements are distributed according to a uniform distribution with parameters \(\mathcal{U}(min, max)\).See Uniform Distribution.
- Return
A
Variable
containing a tensor with random values distributed accordingly.- Parameters
-
Variable
normal
(int inputSize, int outputSize, double stdv = 1, double mean = 0, fl::dtype type = fl::dtype::f32, bool calcGrad = true)¶ Creates a
Variable
representing a tensor with dimensions[inputSize, outputSize]
where elements are distributed according to a normal distribution with parameters \(\mathcal{N}(\mu, \sigma^2)\).See Normal Distribution.
- Return
A
Variable
containing a tensor with random values distributed accordingly.- Parameters
inputSize
: the second dimension for the output tensor shapeoutputSize
: the first dimension of the output tensor shapestdv
: the standard deviation by which to parameterize the distributionmean
: the mean by which to parameterize the distributiontype
: the datatype for which to create the tensorcalcGrad
: flag denoting whether gradient calculation on the resultingVariable
should be enabled
-
Variable
normal
(const Shape &shape, double stdv = 1, double mean = 0, fl::dtype type = fl::dtype::f32, bool calcGrad = true)¶ Creates a
Variable
representing a tensor of up to rank 4 with arbitrary dimensions, where elements are distributed according to a normal distribution with parameters \(\mathcal{N}(\mu, \sigma^2)\).See Normal Distribution.
- Return
A
Variable
containing a tensor with random values distributed accordingly.- Parameters
-
Variable
kaimingUniform
(const Shape &shape, int fanIn, fl::dtype type = fl::dtype::f32, bool calcGrad = true)¶ Creates a
Variable
representing a tensor with given input dimensions where elements are uniformly distributed according to the method outlined in He et al (2015): Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.- Return
A
Variable
containing a tensor with random values distributed accordingly.- Parameters
-
Variable
kaimingNormal
(const Shape &shape, int fanIn, fl::dtype type = fl::dtype::f32, bool calcGrad = true)¶ Creates a
Variable
representing a tensor with given input dimensions where elements are normally distributed according to the method outlined in He et al (2015): Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.- Return
A
Variable
containing a tensor with random values distributed accordingly.- Parameters
-
Variable
glorotUniform
(const Shape &shape, int fanIn, int fanOut, fl::dtype type = fl::dtype::f32, bool calcGrad = true)¶ Creates a
Variable
representing a tensor with given input dimensions where elements are uniformly distributed according to the method outlined in Glorot and Bengio (2010): Understanding the difficulty of training deep feedforward neural networks.- Return
A
Variable
containing a tensor with random values distributed accordingly.- Parameters
-
Variable
glorotNormal
(const Shape &shape, int fanIn, int fanOut, fl::dtype type = fl::dtype::f32, bool calcGrad = true)¶ Creates a
Variable
representing a tensor with given input dimensions where elements are normally distributed according to the method outlined in Glorot and Bengio (2010): Understanding the difficulty of training deep feedforward neural networks.- Return
A
Variable
containing a tensor with random values distributed accordingly.- Parameters
-
Variable
truncNormal
(const Shape &shape, double stdv = 1., double mean = 0., double minCufOff = -2., double maxCutOff = 2., fl::dtype type = fl::dtype::f32, bool calcGrad = true)¶ Creates a
Variable
representing a tensor with given input dimensions where elements are distributed according to the a truncated normal distribution as in here.- Return
A
Variable
containing a tensor with random values distributed accordingly.- Parameters
shape
: an ArrayFire tensor shapestdv
: the standard deviation by which to parameterize the distributionmean
: the mean by which to parameterize the distributionminCufOff
: the minimum value of the distributionmaxCutOff
: the maximum value of the distributiontype
: the datatype for which to create the tensorcalcGrad
: flag denoting whether gradient calculation on the resultingVariable
should be enabled
-
Variable
Utils¶
Compute the total number of parameters of a fl::Module.
- Return
the number of parameters in the module
- Parameters
[in] module
: The module over which to compute params
-
bool
fl
::
allParamsClose
(const Module &a, const Module &b, double absTolerance = 1e-5)¶ Returns true if the parameters of two modules are of same type and are element-wise equal within a given tolerance limit.
- Parameters
[ab]
: input Modules to compareabsTolerance
: absolute tolerance allowed
-
int
fl
::
derivePadding
(int inSz, int filterSz, int stride, int pad, int dilation)¶
DistributedUtils¶
Registers a module for allreduce synchronization with a gradient hook on it parameter Variables.
- Parameters
[in] module
: a module whose parameter gradients will be synchronized[in] a
:Reducer
instance to which gradients will be immediately added when available
Traverses the network and averages its parameters with allreduce.
- Parameters
module
: a module whose parameters will be synchronized
Traverses the network and synchronizes the gradients of its parameters with allreduce.
- Parameters
module
: a module whose parameter gradients will be synchronizedscale
: scale gradients after allreduce by this factor