Skip to content
This repository has been archived by the owner on Aug 5, 2022. It is now read-only.

Recommendations to achieve best performance

Pawel Noga edited this page Oct 13, 2016 · 29 revisions

To achieve best performance with IntelCaffe on Intel CPU please apply the following recommendations:

Hardware / BIOS configuration:

  • Make sure that your hardware configurations includes fast SSD (M.2) drive. If during trainings/scoring you will observe in logs "waiting for data" - you should install better SSD or reduce batchsize.
  • With Intel Xeon Phi™ product family - enter BIOS (MCDRAM section) and set MCDRAM mode as cache
  • Enable Hyper-treading (HT) on your platform - those setting can be found in BIOS (CPU section).
  • Optimize hardware in BIOS: set CPU max frequency, set 100% fan speed, check cooling system.

Software / OS configuration:

  • With Intel Xeon Phi™ product family - it is recommended to use Linux Centos 7.2 or newer
  • It is recommended to use newest XPPSL software for Intel Xeon Phi™ product family: [https://mic-bld.pdx.intel.com/release/external/XPPSL/] (https://mic-bld.pdx.intel.com/release/external/XPPSL/)
  • Make sure that there are no unnecesary processes during traning and scoring. IntelCaffe is using all available resources and other processes (like monitoring tools, java processes, network trafic etc.) might impact performance.

Caffe / Hyper-Parameters configuration:

  • Change prototxt file with network topology to Intel MKL's optimized versions. Caffe includes optimized (for Intel MKL2017) versions of popular prototxt files. Those files have specific engines set for each layer.
  • Use LMDB data layer (Using ‘Images’ layer as data source will result in suboptimal performance). Our recommendation is to use 95% compression ratio for LMDB, or to achieve maximum theoretical performance - don't use any data layer.
  • Change batchsize in prototxt files. On some configurations higher batchsize will leads to better results.
  • Current implementation uses OpenMP threads. By default the number of OpenMP threads is set to the number of CPU cores. Each one thread is bound to a single core to achieve best performance results. It is however possible to use own configuration by providing right one through OpenMP environmental variables like KMP_AFFINITY, OMP_NUM_THREADS or GOMP_CPU_AFFINITY. For Intel Xeon Phi™ product family we recommend to use OMP_NUM_THREADS = numer_of_corres-2.
Clone this wiki locally