Add benchmark README

catchthemonster · Jun 23, 2020 · 555e402 · 555e402
1 parent 54993ec
commit 555e402
Show file tree

Hide file tree

Showing 3 changed files with 54 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -93,6 +93,17 @@ You can not yet retrieve automatically all similar sequences with the noise redu
 
 CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the [documentation][1]. To tune you can use the `model_selection` module from `sklearn`, you can find an example [here][3] on how to.
 
+## Benchmark
+
+The benchmark has been made on the FIFA dataset, the data can be find on the [`spmf` website][4].
+
+Using multithreading `CPT` has been able to perform around 5000 predictions per second.
+
+Without multithreading, `CPT` predicted around 1650 sequences per second.
+
+Details on the benchmark can be found [here](benchmark).
+
 [1]: https://cpt.readthedocs.io/en/latest/
 [2]: https://github.com/bluesheeptoken/CPT#tuning
 [3]: https://cpt.readthedocs.io/en/latest/example.html#sklearn-example
+[4]: https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php
diff --git a/benchmark/README.md b/benchmark/README.md
@@ -0,0 +1,19 @@
+# Benchmark
+
+## Data
+The benchmark has been realized on the [`FIFA`](https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php) dataset.
+
+The training has been made with 20_450 sequences with an average length of 34 and an alphabet of 2990 elements.
+
+## Setup
+The benchmark has been realized with a PC with 8 GB of ram, 8 cores and the `Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz` CPU.
+
+## How to run the code
+You can get the data with `curl`: `curl http://www.philippe-fournier-viger.com/spmf/datasets/FIFA.txt --output FIFA.dat`.
+
+With `FIFA.dat` in the data folder, you can run the becnhmark from the root folder: `python benchmark/benchmark.py`.
+
+## Results
+Using multithreading, `CPT` made 4869 predictions per second, which is an average of 0.2 ms per prediction.
+
+However, most use case does not take advantage of multithreading. Without multithreading, `CPT` made 1662 predictions per second, which is an average of 0.6ms per prediction.
diff --git a/benchmark/benchmark.py b/benchmark/benchmark.py
@@ -0,0 +1,24 @@
+import os
+import sys
+import time
+# Add cpt to python path
+sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
+
+from cpt import Cpt  # pylint: disable=wrong-import-position
+
+with open("data/FIFA.dat") as file:
+    data = list(map(lambda l: [int(x) for x in l.rstrip().split() if int(x) >= 0], file.readlines()))
+
+cpt = Cpt()
+
+cpt.fit(data)
+
+prediction_data = list(map(lambda x: x[-10:], data))
+
+cpt.MBR = 10
+cpt.noise_ratio = 0.2
+
+time1 = time.time()
+cpt.predict(prediction_data, True)
+time2 = time.time()
+print(f"time ellapsed {(time2-time1)*1000} ms")