scikit-learn_bench benchmarks various implementations of machine learning algorithms across data analytics frameworks. Scikit-learn_bench can be extended to add new frameworks and algorithms. It currently support the scikit-learn, DAAL4PY, cuML, and XGBoost frameworks for commonly used machine learning algorithms.
See benchmark results here.
- Prerequisites
- How to create conda environment for benchmarking
- Running Python benchmarks with runner script
- Supported algorithms
- Algorithms parameters
- Legacy automatic building and running
python
andscikit-learn
to run python versions- pandas when using its DataFrame as input data format
icc
,ifort
,mkl
,daal
to compile and run native benchmarks- machine learning frameworks, that you want to test. Check this item to get additional information how to set environment.
Create a suitable conda environment for each framework to test. Each item in the list below links to instructions to create an appropriate conda environment for the framework.
Run python runner.py --config configs/config_example.json [--output-format json --verbose]
to launch benchmarks.
runner options:
config
: the path to configuration filedummy-run
: run configuration parser and datasets generation without benchmarks runningverbose
: print additional information during benchmarks runningoutput-format
: json or csv. Output type of benchmarks to use with their runner
Benchmarks currently support the following frameworks:
- scikit-learn
- daal4py
- cuml
- xgboost
The configuration of benchmarks allows you to select the frameworks to run, select datasets for measurements and configure the parameters of the algorithms.
You can configure benchmarks by editing a config file. Check config.json schema for more details.
algorithm | benchmark name | sklearn | daal4py | cuml | xgboost |
---|---|---|---|---|---|
DBSCAN | dbscan | ✅ | ✅ | ✅ | ❌ |
RandomForestClassifier | df_clfs | ✅ | ✅ | ✅ | ❌ |
RandomForestRegressor | df_regr | ✅ | ✅ | ✅ | ❌ |
pairwise_distances | distances | ✅ | ✅ | ❌ | ❌ |
KMeans | kmeans | ✅ | ✅ | ✅ | ❌ |
KNeighborsClassifier | knn_clsf | ✅ | ❌ | ✅ | ❌ |
LinearRegression | linear | ✅ | ✅ | ✅ | ❌ |
LogisticRegression | log_reg | ✅ | ✅ | ✅ | ❌ |
PCA | pca | ✅ | ✅ | ✅ | ❌ |
Ridge | ridge | ✅ | ✅ | ✅ | ❌ |
SVM | svm | ✅ | ✅ | ✅ | ❌ |
train_test_split | train_test_split | ✅ | ❌ | ✅ | ❌ |
GradientBoostingClassifier | gbt | ❌ | ❌ | ❌ | ✅ |
GradientBoostingRegressor | gbt | ❌ | ❌ | ❌ | ✅ |
You can launch benchmarks for each algorithm separately. To do this, go to the directory with the benchmark:
cd <framework>
Run the following command:
python <benchmark_file> --dataset-name <path to the dataset> <other algorithm parameters>
The list of supported parameters for each algorithm you can find here:
- Run
make
. This will generate data, compile benchmarks, and run them.- To run only scikit-learn benchmarks, use
make sklearn
. - To run only native benchmarks, use
make native
. - To run only daal4py benchmarks, use
make daal4py
. - To run a specific implementation of a specific benchmark,
directly request the corresponding file:
make output/<impl>/<bench>.out
. - If you have activated a conda environment, the build will use daal from the conda environment, if available.
- To run only scikit-learn benchmarks, use