Skip to content

Commit

Permalink
Updated README and added general guidelines to contribute
Browse files Browse the repository at this point in the history
  • Loading branch information
bluesheeptoken committed Mar 12, 2019
1 parent 971adf0 commit 748714d
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 46 deletions.
46 changes: 46 additions & 0 deletions Contribute.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Guidelines

To compile, cython needs to be installed.

## Tests
### Run tests
Pytest is used for tests

`make test`

### Generate coverage
To generate coverage, you should use the coverage python module

For the python code you can use `pytest --cov=cpt tests`

## Linter
pycodestyle and pylint are used for linter

`make lint`

## Sources

## Data
### Download files
To download data, you will need to install lfs git extension

## Profiling
### Add metadata to metadata.json
You should run `python generate_metadata.py <data_path> <datasetname>` from the data directory

For instance, `python generate_metadata.py FIFA.dat partial_fifa`

### Run profiling
To run the profiling, you need to run the command `python profiling/profiling.py <mode> <data_path> <profile_path>`

For instance, `python profiling/profiling.py train data/FIFA.dat profiling/sample_profiling.profile`

The mode should be either train or predict

The train profiles should be made with the full datasets, the predict profiles should be made with the partial datasets. The `predict` method is taking more time than the `train` method, so a smaller dataset is enough to profile `predict`

### Read stats
To read stats you need to use the [pstats](https://docs.python.org/3/library/profile.html) module in python. `python -m pstats <profile_path>`

## Before pushing
Make sure you ran `make test` and `make lint` before pushing
54 changes: 8 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# CPT

CPT is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading.

This is an implementation of the following research papers

http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf

http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf

## Simple example

You can test the model with the following code
Expand Down Expand Up @@ -49,49 +57,3 @@ unpickled_model = pickle.loads(dumped)

print(model == unpickled_model)
```

## Tests
### Run tests
Pytest is used for tests

`make test`

### Generate coverage
To generate coverage, you should use the coverage python module

For the python code you can use `pytest --cov=cpt tests`

## Linter
pycodestyle and pylint are used for linter

`make lint`

## Sources
http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf

http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf

## Data
### Download files
To download data, you will need to install lfs git extension

## Profiling
### Add metadata to metadata.json
You should run `python generate_metadata.py <data_path> <datasetname>` from the data directory

For instance, `python generate_metadata.py FIFA.dat partial_fifa`

### Run profiling
To run the profiling, you need to run the command `python profiling/profiling.py <mode> <data_path> <profile_path>`

For instance, `python profiling/profiling.py train data/FIFA.dat profiling/sample_profiling.profile`

The mode should be either train or predict

The train profiles should be made with the full datasets, the predict profiles should be made with the partial datasets. The `predict` method is taking more time than the `train` method, so a smaller dataset is enough to profile `predict`

### Read stats
To read stats you need to use the [pstats](https://docs.python.org/3/library/profile.html) module in python. `python -m pstats <profile_path>`

## Before pushing
Make sure you ran `make test` and `make lint` before pushing

0 comments on commit 748714d

Please sign in to comment.