Common

enum class ReduceMode

Reduction mode to used for CrossEntropy, AdaptiveSoftMax etc …

Values:

enumerator NONE
enumerator MEAN
enumerator SUM
enum class PoolingMode

Pooling method to be used.

Values:

enumerator MAX

Use maximum value inside the pooling window.

enumerator AVG_INCLUDE_PADDING

Use average value (including padding) inside the pooling window.

enumerator AVG_EXCLUDE_PADDING

Use average value (excluding padding) inside the pooling window// Use average value (excluding padding) inside the pooling window.

enum class RnnMode

RNN network type.

Values:

enumerator RELU
enumerator TANH
enumerator LSTM
enumerator GRU
enum class PaddingMode

Values:

enumerator SAME

Use smallest possible padding such that out_size = ceil(in_size/stride)

enum class DistributedBackend

Values:

enumerator GLOO

https://github.com/facebookincubator/gloo

enumerator NCCL

https://developer.nvidia.com/nccl

enumerator STUB
enum class DistributedInit

Values:

enumerator MPI
enumerator FILE_SYSTEM
enum class OptimLevel

Optimization levels in flashlight.

These determine the computation behavior of autograd operator computation as well as how inputs and outputs of operators are cast.

Operator precision roughly follows those found in NVIDIA Apex:

Values:

enumerator DEFAULT

All operations occur in default (f32 or f64) precision.

enumerator O1

Operations that perform reduction accumulation, including layer/batch normalization are performed in f32 - all other operations are in fp16.

To be used in a standard mixed-precision training setup.

enumerator O2

Only batch and layer normalization occur in f32 - all other operations occur in f16.

enumerator O3

All operations that support it use fp16.

constexpr std::size_t kDynamicBenchmarkDefaultCount = 10
constexpr double kAmpMinimumScaleFactorValue = 1e-4
class __attribute__ ((visibility("default"))) OptimMode

Singleton storing the current optimization level (OptimLevel) for flashlight.

Warning

doxygengroup: Cannot find group “common_utils” in doxygen xml output for project “flashlight” from directory: ../build/xml

class DevicePtr

DevicePtr provides an RAII wrapper for accessing the device pointer of a Flashlight Tensor array.

After calling device() on a Flashlight tensor to get a device pointer, its underlying memory is not free until unlock() is called - see fl::Tensor::unlock(). DevicePtr provides a std::unique_lock style API which calls the unlock() function in its destructor after getting device pointer. A DevicePtr is movable, but not copyable.

Example Usage :

auto A = Tensor({10, 10});
{
    DevicePtr devPtr(A); // calls `.device<>()` on array.
    void* ptr = devPtr.get();
}
// devPtr is destructed and A.unlock() is automatically called

Public Functions

inline DevicePtr()

Creates a null DevicePtr.

explicit DevicePtr(const Tensor &in)
Parameters:

in – input array to get device pointer

~DevicePtr()

.unlock() is called on the underlying array in destructor

DevicePtr(const DevicePtr &other) = delete
DevicePtr &operator=(const DevicePtr &other) = delete
DevicePtr(DevicePtr &&d) noexcept
DevicePtr &operator=(DevicePtr &&other) noexcept
inline bool operator==(const DevicePtr &other) const
void *get() const
template<typename T>
inline T *getAs() const
class ThreadPool

A simple C++11 Thread Pool implementation.

Source - https://github.com/progschj/ThreadPool

Basic usage:

// create thread pool with 4 worker threads
ThreadPool pool(4);

// enqueue and store future
auto result = pool.enqueue([](int answer) { return answer; }, 42);

// get result from future
std::cout << result.get() << std::endl;

Public Functions

inline ThreadPool(size_t threads, const std::function<void(size_t)> &initFn = nullptr)

the constructor just launches given amount of workers

Parameters:
  • threads[in] number of threads

  • initFn[in] initialization code (if any) that will be run on all the threads

template<class F, class ...Args>
auto enqueue(F &&f, Args&&... args) -> std::future<typename std::invoke_result<F, Args...>::type>

add new work item to the pool

Parameters:
  • f[in] function to be executed in threadpool

  • args[in] varadic arguments for the function

inline ~ThreadPool()

destructor joins all threads.