Skip to content

Latest commit

 

History

History
74 lines (59 loc) · 4.16 KB

computation_and_scaling.md

File metadata and controls

74 lines (59 loc) · 4.16 KB

Computation and scaling

Draft of computation and scaling on multicore processors and related issues/tooling.

  • Pandas - fast, powerful, flexible and easy to use open source data analysis and manipulation tool
  • NumPy - fundamental package for scientific computing with Python
  • SciPy - ecosystem of open-source software for mathematics, science, and engineering
  • PyMC - probabilistic programming library for Python
  • Swifter - efficiently applies any function to a pandas dataframe or series in the fastest available manner
  • Dask - Scalable analytics in Python - open source library for parallel computing

Biology / Genomics

  • PyRanges - GenomicRanges and genomic Rle-objects for Python.
  • Epic2 - epic2 is an ultraperformant reimplementation of SICER. It focuses on speed, low memory overhead and ease of use.

On avoiding oversubscription

  • Thread-pool Controls - Python helpers to limit the number of threads used in native libraries that handle their own internal threadpool (BLAS and OpenMP implementations)
  • Multiple OpenMP runtimes - Issues having multiple OpenMP runtimes and proposed approaches
  • Sharedmem (documentation) - Easier parallel programming on shared memory computers
  • Static Multi-Processing - SMP module allows to set static affinity mask for each process inside process pool to limit total number of threads running in application.

Most of the tools for effective computation already come with a lot of optimizations for multicore systems. When optimizing for performance we have to take previous optimizations in consideration to avoid opposite effects. In the case of oversubcription for example when running multitude of python processes we may consider limiting number of available cores for optimization layer (specific library - e.g. blas).

from threadpoolctl import threadpool_info, threadpool_limits
from pprint import pprint
import numpy as np

pprint(threadpool_info())

with threadpool_limits(limits=1, user_api='blas'):
    # In this block, calls to blas implementation (like openblas or MKL)
    # will be limited to use only one thread. They can thus be used jointly
    # with thread-parallelism.
    a = np.random.randn(1000, 1000)
    a_squared = a @ a

Limiting threads for common optimization libraries:

try:
    # Don't place numpy import before this block
    import os
    os.environ["OMP_NUM_THREADS"] = "1"  # OMP_NUM_THREADS: openmp
    os.environ["OPENBLAS_NUM_THREADS"] = "1"  # OPENBLAS_NUM_THREADS: openblas
    os.environ["MKL_THREADING_LAYER"] = "sequential"  # MKL_THREADING_LAYER: mkl (instead of MKL_THREADING_LAYER=1)
    os.environ["VECLIB_MAXIMUM_THREADS"] = "1"  # VECLIB_MAXIMUM_THREADS: accelerate
    os.environ["NUMEXPR_NUM_THREADS"] = "1"  # NUMEXPR_NUM_THREADS: numexpr
    os.environ["MXNET_CPU_WORKER_NTHREADS"] = "1"  # MXNET_CPU_WORKER_NTHREADS: mxnet
except Exception:
    pass
import numpy as np

print(np.show_config())
try:
    import mkl
    mkl.set_num_threads(1)
except:
    pass

Resources