Skip to content

Examples for mixed-precision training for utilizing TensorCores in NVIDIA Volta GPUs

Notifications You must be signed in to change notification settings

khcs/fp16-demo-tf

Repository files navigation

fp16-demo-tf

Some example codes for mixed-precision training in TensorFlow and PyTorch.

General rule of thumb:

It's good to have parameters as multiple of 8 to utilize performance of TensorCores in Volta GPUs.

  • Convolutions: Multiple of 8 - Number of input channels, output channels, batch size
  • GEMM: Multiple of 8 - M, N, K dimensions
  • Fully connected layers: Multiple of 8 - Input features, output features, batch size

The examples:

Checking if TensorCores are utilized

  • Run the program with nvprof and see the log output - if there's kernel calls with "884" then TensorCores are called. Example:
nvprof python mnist_softmax_deep_conv_fp16_advanced.py

Notes about loss-scaling

The "default" loss-scaling value of 128 works for all the examples here. However, in a case it doesn't work, it's advised to choose a large value and gradually decrease it until sucessful. apex is a easy-to-use mixed-precision training utilities for PyTorch, and it's loss-scaler does that.

About

Examples for mixed-precision training for utilizing TensorCores in NVIDIA Volta GPUs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages