Draft of computation and scaling on multicore processors and related issues/tooling.
- Pandas - fast, powerful, flexible and easy to use open source data analysis and manipulation tool
- Modern Pandas - Part 1 - Intro to modern Pandas
- NumPy - fundamental package for scientific computing with Python
- SciPy - ecosystem of open-source software for mathematics, science, and engineering
- PyMC - probabilistic programming library for Python
- Swifter - efficiently applies any function to a pandas dataframe or series in the fastest available manner
- Dask - Scalable analytics in Python - open source library for parallel computing
- PyRanges - GenomicRanges and genomic Rle-objects for Python.
- Epic2 - epic2 is an ultraperformant reimplementation of SICER. It focuses on speed, low memory overhead and ease of use.
- Thread-pool Controls - Python helpers to limit the number of threads used in native libraries that handle their own internal threadpool (BLAS and OpenMP implementations)
- Multiple OpenMP runtimes - Issues having multiple OpenMP runtimes and proposed approaches
- Sharedmem (documentation) - Easier parallel programming on shared memory computers
- Static Multi-Processing - SMP module allows to set static affinity mask for each process inside process pool to limit total number of threads running in application.
Most of the tools for effective computation already come with a lot of optimizations for multicore systems. When optimizing for performance we have to take previous optimizations in consideration to avoid opposite effects. In the case of oversubcription for example when running multitude of python processes we may consider limiting number of available cores for optimization layer (specific library - e.g. blas).
from threadpoolctl import threadpool_info, threadpool_limits
from pprint import pprint
import numpy as np
pprint(threadpool_info())
with threadpool_limits(limits=1, user_api='blas'):
# In this block, calls to blas implementation (like openblas or MKL)
# will be limited to use only one thread. They can thus be used jointly
# with thread-parallelism.
a = np.random.randn(1000, 1000)
a_squared = a @ a
Limiting threads for common optimization libraries:
try:
# Don't place numpy import before this block
import os
os.environ["OMP_NUM_THREADS"] = "1" # OMP_NUM_THREADS: openmp
os.environ["OPENBLAS_NUM_THREADS"] = "1" # OPENBLAS_NUM_THREADS: openblas
os.environ["MKL_THREADING_LAYER"] = "sequential" # MKL_THREADING_LAYER: mkl (instead of MKL_THREADING_LAYER=1)
os.environ["VECLIB_MAXIMUM_THREADS"] = "1" # VECLIB_MAXIMUM_THREADS: accelerate
os.environ["NUMEXPR_NUM_THREADS"] = "1" # NUMEXPR_NUM_THREADS: numexpr
os.environ["MXNET_CPU_WORKER_NTHREADS"] = "1" # MXNET_CPU_WORKER_NTHREADS: mxnet
except Exception:
pass
import numpy as np
print(np.show_config())
try:
import mkl
mkl.set_num_threads(1)
except:
pass