Skip to content

Commit

Permalink
Add benchmark README
Browse files Browse the repository at this point in the history
  • Loading branch information
bluesheeptoken authored and github-louis-fruleux committed Jun 23, 2020
1 parent 54993ec commit 555e402
Show file tree
Hide file tree
Showing 3 changed files with 54 additions and 0 deletions.
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,17 @@ You can not yet retrieve automatically all similar sequences with the noise redu

CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the [documentation][1]. To tune you can use the `model_selection` module from `sklearn`, you can find an example [here][3] on how to.

## Benchmark

The benchmark has been made on the FIFA dataset, the data can be find on the [`spmf` website][4].

Using multithreading `CPT` has been able to perform around 5000 predictions per second.

Without multithreading, `CPT` predicted around 1650 sequences per second.

Details on the benchmark can be found [here](benchmark).

[1]: https://cpt.readthedocs.io/en/latest/
[2]: https://github.com/bluesheeptoken/CPT#tuning
[3]: https://cpt.readthedocs.io/en/latest/example.html#sklearn-example
[4]: https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php
19 changes: 19 additions & 0 deletions benchmark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Benchmark

## Data
The benchmark has been realized on the [`FIFA`](https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php) dataset.

The training has been made with 20_450 sequences with an average length of 34 and an alphabet of 2990 elements.

## Setup
The benchmark has been realized with a PC with 8 GB of ram, 8 cores and the `Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz` CPU.

## How to run the code
You can get the data with `curl`: `curl http://www.philippe-fournier-viger.com/spmf/datasets/FIFA.txt --output FIFA.dat`.

With `FIFA.dat` in the data folder, you can run the becnhmark from the root folder: `python benchmark/benchmark.py`.

## Results
Using multithreading, `CPT` made 4869 predictions per second, which is an average of 0.2 ms per prediction.

However, most use case does not take advantage of multithreading. Without multithreading, `CPT` made 1662 predictions per second, which is an average of 0.6ms per prediction.
24 changes: 24 additions & 0 deletions benchmark/benchmark.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import os
import sys
import time
# Add cpt to python path
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))

from cpt import Cpt # pylint: disable=wrong-import-position

with open("data/FIFA.dat") as file:
data = list(map(lambda l: [int(x) for x in l.rstrip().split() if int(x) >= 0], file.readlines()))

cpt = Cpt()

cpt.fit(data)

prediction_data = list(map(lambda x: x[-10:], data))

cpt.MBR = 10
cpt.noise_ratio = 0.2

time1 = time.time()
cpt.predict(prediction_data, True)
time2 = time.time()
print(f"time ellapsed {(time2-time1)*1000} ms")

0 comments on commit 555e402

Please sign in to comment.