diff --git a/Contribute.md b/Contribute.md new file mode 100644 index 0000000..4ee5a0f --- /dev/null +++ b/Contribute.md @@ -0,0 +1,46 @@ +# Guidelines + +To compile, cython needs to be installed. + +## Tests +### Run tests +Pytest is used for tests + +`make test` + +### Generate coverage +To generate coverage, you should use the coverage python module + +For the python code you can use `pytest --cov=cpt tests` + +## Linter +pycodestyle and pylint are used for linter + +`make lint` + +## Sources + +## Data +### Download files +To download data, you will need to install lfs git extension + +## Profiling +### Add metadata to metadata.json +You should run `python generate_metadata.py ` from the data directory + +For instance, `python generate_metadata.py FIFA.dat partial_fifa` + +### Run profiling +To run the profiling, you need to run the command `python profiling/profiling.py ` + +For instance, `python profiling/profiling.py train data/FIFA.dat profiling/sample_profiling.profile` + +The mode should be either train or predict + +The train profiles should be made with the full datasets, the predict profiles should be made with the partial datasets. The `predict` method is taking more time than the `train` method, so a smaller dataset is enough to profile `predict` + +### Read stats +To read stats you need to use the [pstats](https://docs.python.org/3/library/profile.html) module in python. `python -m pstats ` + +## Before pushing +Make sure you ran `make test` and `make lint` before pushing diff --git a/README.md b/README.md index b1bd77e..a779989 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,13 @@ # CPT +CPT is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading. + +This is an implementation of the following research papers + +http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf + +http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf + ## Simple example You can test the model with the following code @@ -49,49 +57,3 @@ unpickled_model = pickle.loads(dumped) print(model == unpickled_model) ``` - -## Tests -### Run tests -Pytest is used for tests - -`make test` - -### Generate coverage -To generate coverage, you should use the coverage python module - -For the python code you can use `pytest --cov=cpt tests` - -## Linter -pycodestyle and pylint are used for linter - -`make lint` - -## Sources -http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf - -http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf - -## Data -### Download files -To download data, you will need to install lfs git extension - -## Profiling -### Add metadata to metadata.json -You should run `python generate_metadata.py ` from the data directory - -For instance, `python generate_metadata.py FIFA.dat partial_fifa` - -### Run profiling -To run the profiling, you need to run the command `python profiling/profiling.py ` - -For instance, `python profiling/profiling.py train data/FIFA.dat profiling/sample_profiling.profile` - -The mode should be either train or predict - -The train profiles should be made with the full datasets, the predict profiles should be made with the partial datasets. The `predict` method is taking more time than the `train` method, so a smaller dataset is enough to profile `predict` - -### Read stats -To read stats you need to use the [pstats](https://docs.python.org/3/library/profile.html) module in python. `python -m pstats ` - -## Before pushing -Make sure you ran `make test` and `make lint` before pushing