Common¶
-
enum class ReduceMode¶
Reduction mode to used for CrossEntropy, AdaptiveSoftMax etc …
Values:
-
enumerator NONE¶
-
enumerator MEAN¶
-
enumerator SUM¶
-
enumerator NONE¶
-
enum class PoolingMode¶
Pooling method to be used.
Values:
-
enumerator MAX¶
Use maximum value inside the pooling window.
-
enumerator AVG_INCLUDE_PADDING¶
Use average value (including padding) inside the pooling window.
-
enumerator AVG_EXCLUDE_PADDING¶
Use average value (excluding padding) inside the pooling window// Use average value (excluding padding) inside the pooling window.
-
enumerator MAX¶
-
enum class RnnMode¶
RNN network type.
Values:
-
enumerator RELU¶
-
enumerator TANH¶
-
enumerator LSTM¶
-
enumerator GRU¶
-
enumerator RELU¶
-
enum class PaddingMode¶
Values:
-
enumerator SAME¶
Use smallest possible padding such that out_size = ceil(in_size/stride)
-
enumerator SAME¶
-
enum class OptimLevel¶
Optimization levels in flashlight.
These determine the computation behavior of autograd operator computation as well as how inputs and outputs of operators are cast.
Operator precision roughly follows those found in NVIDIA Apex:
Values:
-
enumerator DEFAULT¶
All operations occur in default (f32 or f64) precision.
-
enumerator O1¶
Operations that perform reduction accumulation, including layer/batch normalization are performed in f32 - all other operations are in fp16.
To be used in a standard mixed-precision training setup.
-
enumerator O2¶
Only batch and layer normalization occur in f32 - all other operations occur in f16.
-
enumerator O3¶
All operations that support it use fp16.
-
enumerator DEFAULT¶
-
constexpr std::size_t kDynamicBenchmarkDefaultCount = 10¶
-
constexpr double kAmpMinimumScaleFactorValue = 1e-4¶
- class __attribute__ ((visibility("default"))) OptimMode
Singleton storing the current optimization level (
OptimLevel
) for flashlight.
Warning
doxygengroup: Cannot find group “common_utils” in doxygen xml output for project “flashlight” from directory: ../build/xml
-
class DevicePtr¶
DevicePtr provides an RAII wrapper for accessing the device pointer of a Flashlight Tensor array.
After calling
device()
on a Flashlight tensor to get a device pointer, its underlying memory is not free untilunlock()
is called - seefl::Tensor::unlock()
. DevicePtr provides astd::unique_lock
style API which calls theunlock()
function in its destructor after getting device pointer. A DevicePtr is movable, but not copyable.Example Usage :
auto A = Tensor({10, 10}); { DevicePtr devPtr(A); // calls `.device<>()` on array. void* ptr = devPtr.get(); } // devPtr is destructed and A.unlock() is automatically called
-
class ThreadPool¶
A simple C++11 Thread Pool implementation.
Source - https://github.com/progschj/ThreadPool
Basic usage:
// create thread pool with 4 worker threads ThreadPool pool(4); // enqueue and store future auto result = pool.enqueue([](int answer) { return answer; }, 42); // get result from future std::cout << result.get() << std::endl;
Public Functions
-
inline ThreadPool(size_t threads, const std::function<void(size_t)> &initFn = nullptr)¶
the constructor just launches given amount of workers
- Parameters:
threads – [in] number of threads
initFn – [in] initialization code (if any) that will be run on all the threads
-
template<class F, class ...Args>
auto enqueue(F &&f, Args&&... args) -> std::future<typename std::invoke_result<F, Args...>::type>¶ add new work item to the pool
- Parameters:
f – [in] function to be executed in threadpool
args – [in] varadic arguments for the function
-
inline ~ThreadPool()¶
destructor joins all threads.
-
inline ThreadPool(size_t threads, const std::function<void(size_t)> &initFn = nullptr)¶