资源算法tensorflow-determinism

tensorflow-determinism

2019-12-23 | |  43 |   0 |   0

TensorFlow Determinism

This repository serves three purposes:

  1. Provide up-to-date information (in this file) about non-determinism sources and solutions in TensorFlow and beyond, with a focus on determinism when running on GPUs.

  2. Provide a patch to attain various levels of GPU-specific determinism in stock TensorFlow, via the installation of the tensorflow-determinism pip package.

  3. Be the location where a TensorFlow determinism debug tool will be released as part of the tensorflow-determinism pip package.

For more information, please watch the video of the GTC 2019 talkDeterminism in Deep Learning. The desciption under that video also includes links to the slides from the talk and to a poster presentation on this topic.

Installation

Use pip to install:

pip install tensorflow-determinism

This will install a package that can be imported as tfdeterminism. The installation of tensorflow-determinism will not automatically install TensorFlow. The intention of this is to allow you to install your chosen version of TensorFlow. You will need to install your chosen version of TensorFlow before you can import and use tfdeterminism.

Deterministic TensorFlow Solutions

There are currently two main ways to access GPU-deterministic functionality in TensorFlow for most deep learning applications. The first way is to use an NVIDIA NGC TensorFlow container. The second way is to use version 1.14, 1.15, or 2.0 of stock TensorFlow with GPU support, plus the application of a patch supplied in this repo.

The longer-term intention and plan is to upstream all solutions into stock TensorFlow.

Determinism is not guaranteed when XLA JIT compilation is enabled.

NVIDIA NGC TensorFlow Containers

NGC TensorFlow containers, starting with version 19.06, implement GPU-deterministic TensorFlow functionality. In Python code running inside the container, this can be enabled as follows:

import tensorflow as tf
import os
os.environ['TF_DETERMINISTIC_OPS'] = '1'
# Now build your graph and train it

The following table shows which version of TensorFlow each NGC container version is based on:

NGC Container VersionTensorFlow Version
19.061.13
19.07 - 19.101.14
19.11 - 19.121.15 / 2.0

For information about pulling and running the NVIDIA NGC containers, see these instructions.

Stock TensorFlow

Versions 1.14, 1.15, and 2.0 of stock TensorFlow implement a reduced form of GPU determinism, which must be supplemented with a patch provided in this repo. The following Python code is running on a machine in which pip packagetensorflow-gpu=2.0.0 has been installed correctly and on whichtensorflow-determinism has also been installed (as shown in theinstallation section above).

import tensorflow as tf
from tfdeterminism import patch
patch()
# use tf as normal

Stock TensorFlow with GPU support can be installed as follows:

pip install tensorflow-gpu=2.0.0

The TensorFlow project includes detailed instructions for installing TensorFlow with GPU support.

Additional Ingredients in the Determinism Recipe

Seeds

You'll also need to set any and all appropriate random seeds:

os.environ['PYTHONHASHSEED']=str(SEED)
random.seed(SEED)
np.random.seed(SEED)
tf.set_random_seed(SEED)

Dataset Sharding

If you're using tf.data.Dataset, you should not shard the dataset. This is achieved by either not calling the shard() method, or by setting itsnum_shards parameter to 1.

Gradient Gating

For deterministic functionality, some types of models may requiregate_gradients=tf.train.Optimizer.GATE_OP in the session config.

Multi-GPU with Horovod

If you're using Horovod for multi-GPU training, you may need to disable Tensor Fusion (assuming that the non-determinism associated with Tensor Fusion has not yet been resolved):

os.environ['HOROVOD_FUSION_THRESHOLD']='0'

CPU

If you want to obtain determinism when your ops are running on the CPU, you may need to limit the number of CPU threads used:

session_config.intra_op_parallelism_threads = 1
session_config.inter_op_parallelism_threads = 1

It should not be necessary to do this when your ops are not running on the CPU (e.g. when they're running on a GPU).

Detailed Status of Determinism in TensorFlow and Beyond

Confirmed and likely sources of non-determinism, along with any existing solutions, are being tracked here.

GPU-Specific Sources of Non-Determinism

Historic GPU-Specific Sources of Non-Determinism

In the past, tf.math.reduce_sum and tf.math.reduce_mean operated non-deterministically when running on a GPU. This was resolved before TensorFlow version 1.12. These ops now function deterministically by default when running on a GPU.

Confirmed Current GPU-Specific Sources of Non-Determinism (With Solutions)

SourceNGC 19.06+ / TF 2.1TF 1.14, 1.15, 2.0
TF auto-tuning of cuDNN convolution algorithms (see multi-algo note)TCD or TDOTCD or TDP
cuDNN convolution backprop to weight gradientsTCD or TDOTCD or TDP
cuDNN convolution backprop to data gradientsTCD or TDOTCD or TDP
cuDNN max-pooling backpropTCD or TDOTCD or TDP
tf.nn.bias_add backprop (see XLA note)TDOTDP
tf.image.resize_bilinear fwd and bwdNS1NS1

Key to the solutions refenced above:

SolutionDescription
TCDSet environment variable TF_CUDNN_DETERMINISTIC to '1' or 'true'. Also do not set environment variable TF_USE_CUDNN_AUTOTUNE at all (and particularly do not set it to '0' or 'false').
TDOSet environment variable TF_DETERMINISTIC_OPS to '1' or 'true'. Also do not set environment variable TF_USE_CUDNN_AUTOTUNE at all (and particularly do not set it to '0' or 'false').
TDPApply tfdeterminism.patch. Note that solution TDO will be in stock TensorFlow v2.1 (see PR 31465).
NS1There is currently no solution available for this, but one is under development.

Notes:

  • multi-algo: From NGC 19.12 onwards, the cuDNN forward and backward convolution algorithms are selected deterministically from several deterministic algorithms. Prior to this (i.e. NGC 19.11 and earlier, and all currently released versions of stock TensorFlow), there is only one deterministic algorithm selected for each of the forward and two backward paths. In those versions of TensorFlow, some layer configurations are not supported (resulting in an exception). The multi-algorithm support is not currently available in stock TensorFlow, but is being added by PR 34951.

  • XLA: These solutions will not work when XLA JIT compilation is enabled.

Other Possible GPU-Specific Sources of Non-Determinism

Going beyond the above-mentioned sources, in version 1.12 of TensorFlow (and also in the master branch on 2019-03-03, afer release 1.31.1), the following files call CUDA atomicAdd either directly or indirectly. This makes them candidates for the injection of non-determinism.

  • crop_and_resize_op_gpu.cu.cc

  • scatter_functor_gpu.cu.h

  • scatter_nd_op_gpu.cu.cc

  • sparse_tensor_dense_matmul_op_gpu.cu.cc

  • resize_nearest_neighbor_op_gpu.cu.cc

  • segment_reduction_ops.h

  • segment_reduction_ops_gpu.cu.cc

  • dilation_ops_gpu.cu.cc

  • maxpooling_op_gpu.cu.cc

  • svd_op_gpu.cu.cc

  • cuda_kernel_helper_test.cu.cc

  • depthwise_conv_op_gpu.h

  • resampler_ops_gpu.cu.cc

  • histogram_op_gpu.cu.cc

  • stateful_random_ops_gpu.cu.cc

Unless you are using TensorFlow ops that depend on these files (i.e. ops with similar names), then your model will not be affected by these potential sources of non-determinism.

Beyond atomicAdd, there are ten other CUDA atomic functions whose use could lead to the injection of non-determinism, such as atomicCAS (the most generic, atomic compare and swap). Note also that the word 'atomic' was present in 167 files in the TensorFlow repo and some of these may be related to the use of CUDA atomic operations. It's important to remember that it's possible to use CUDA atomic operations without injecting non-determinism, and that, therefore, when CUDA atomic operations are present in op code, it doesn't guarantee that the op injects non-determinism into the computation.

Sources of Non-Determinism in TensorFlow Unrelated to GPU

  • Issue 29101: Random seed not set in graph context of Dataset#map. This may have been resolved in version 1.14 of TensorFlow.

  • tf.data.Dataset with more than one shard (aka worker). The work-around is to use only one shard.

Sources of Non-Determinism Beyond TensorFlow

  • TensorRT timing-based kernel schedule. Each time an inference engine is generated, it could be slightly different, particularly if there is varying load on the machine used to run TensorRT. There is a solution planned for this.

  • Horovod Tensor Fusion. Work-around: disable Tensor Fusion by setting the environment variable HOROVOD_FUSION_THRESHOLD to '0'. This issue may have been resolved by Horovodpull-request 1130 (not yet confirmed).

Relevant Links

This section catalogs relevant links.

TensorFlow Issues

NumberTitleUpdated
2652Backward pass of broadcasting on GPU is non-deterministic2019-10-08
2732Mention that GPU reductions are nondeterministic in docs2019-10-08
13932Non-determinism from tf.data.Dataset.map with random ops
16889Problems Getting TensorFlow to behave Deterministically2019-10-08
18096Feature Request: Support for configuring deterministic options of cuDNN conv routines2019-10-08
29101Random seed not set in graph context of Dataset#map

TensorFlow Pull Requests

NumberTitleStatusDate MergedVersion
10636Non-determinism Docs (see note 1)closed

24273Enable dataset.map to respect seeds from the outer contextclosed

24747Add cuDNN deterministic env variable (only for convolution).merged2019-01-151.14
25269Add deterministic cuDNN max-poolingmerged2019-01-301.14
25796Added tests for TF_CUDNN_DETERMINISTICmerged2019-02-221.14
c27902Add a decorator to disable autotuning during test executions.merged2019-03-131.14
29667Add release note about TF_CUDNN_DETERMINISTICmerged2019-08-061.14
31389Enhance release notes related to TF_CUDNN_DETERMINISTICmerged2019-08-071.14
31465Add GPU-deterministic tf.nn.bias_addmerged2019-10-172.1
32979Fix typo in release noteclosed

33483Fix small typo in v2.0.0 release notemerged2019-10-252.1
33803Enable tf.nn.bias_add python op tests to work in eager modeopen

33900Address problems with use_deterministic_cudnn test decoratorclosed

34887Add info about TF_DETERMINISTIC_OPS to v2.1 release notesmerged2019-12-092.1
34951Add multi-algorithm deterministic cuDNN convolutionsopen

35006Fix version 2.1 release note regarding TF_DETERMINISTIC_OPSmerged2019-12-202.1

Notes:

  1. Updated on 2019-10-08

  2. This was effectively a stand-alone commit

Miscellaneous

Credits

Here are the names of some of the people who have helped out with this project. If any names are missing, then please let us know.

Ben Barsdell, Kevin Brown, Carl Case, Bryan Catanzaro, Sharan Chetlur, Joey Conway, Luke Durant, Marc Edgar, Mostafa Hagog, Tero Karras, Bob Keating, Andrew Kerr, Xiang Bo Kong, Nicolas Koumchatzky, Jorge Albericio Latorre, Simon Layton, Jose Alvarez Lopez, Nathan Luehr, Conrado Silva Miranda, John Montrym, Michael O'Connor, Lauri Peltonen, Rakesh Ranjan, Jussi Rasanen, Duncan Riach (PIC), Mikko Ronkainen, Dilip Sequeria, Matthijs De Smedt, Kevin Vincent, Stephen Warren, Hao Wu, Yifang Xu, Tim Zaman, William Zhang.


上一篇:aistore

下一篇:tensorrt-inference-server

用户评价
全部评价

热门资源

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...