Predicting ICO Success

This repository shows our internal Kaggle challenge of the module "Data Science for Business" of the IBIS institute at LMU Munich.

Environment

To install all dependencies you need pipenv. After installing pipenv just type:

pipenv shell

pipenv install

After that you have all dependencies and you are ready to go. For further instructions please visit https://github.com/pypa/pipenv.

Data

Preprocess data with make data

Basic preprocessing steps

Features

make features:

Generate all features which were declared in config meta file: data/features/feature_set_meta/feature_set_meta.json

Example feature set:

{
  "feature_set_1": [
    {
      "meta": {
        "upsampling": "0.0"
      }
    },
    {
      "column": "transaction_count",
      "type": "numerical",
      "na_strategy": "median" 
    }
    {
      "column": "timestamp",
      "type": "binary",
      "na_strategy": "False"
    },
    {
      "column": "divided_total_supplie_transcation_count",
      "type": "divide", 
      "na_strategy": "set:1",
      "columns": ["market_data_total_supply", "transaction_count"]
    }
  ]
}

column: Which column of the data to take?
Type: Which type of data? [numerical, categorical, binary, divide, duration, multiply, add, difference, average, coefficient]
na_strategy: How to handle NAs? [mean, median, set:x, False]
encoder: Encoder for categorical features: [label, one_hot]

Models

Different types of models were implemented. Following models were implemented:

Random Forest
LightGbm
Catboost
Naive Bayes
Logistic Regression

make train_models feature_set=feature_set: All models will be trained on feature_set. After training the results of the training will be saved in a separate results.json in data/results/results.json.

make model_chain: Trains all models on all feature sets, declared in feature_set_meta.json.

Stacking

A stacking approach was implemented as well. With make stacking feature_set=feature_set all models will be validated (5-fold), trained on the whole dataset and a submission file will be created afterwards.

Cross-Validation

The datasets cross validates on 5-folds, trains on the whole dataset and creates submission files for both cases.

Further Make commands

Random Feature Sets

make features_random_meta:

Creates random feature sets without custom features

shap_plots

make shap_plots: Trains all models on all feature sets declared in feature_set_meta.json and creates shap plots for every model and feature set in reports/figures/local.

End Results

In the end we could make first place with a MCC on the private leaderboard of 0.42 MCC whereas the public score 0.48 MCC was.

Name		Name	Last commit message	Last commit date
Latest commit History 277 Commits
data		data
docs		docs
models		models
notebooks		notebooks
references		references
reports		reports
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
SUBMISSION_NUMBER		SUBMISSION_NUMBER
cb_model.json		cb_model.json
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting ICO Success

Environment

Data

Features

Models

Stacking

Cross-Validation

Further Make commands

Random Feature Sets

shap_plots

End Results

About

Releases

Packages

Contributors 2

Languages

License

AlessandroVol23/predict_ico_success_lmu

Folders and files

Latest commit

History

Repository files navigation

Predicting ICO Success

Environment

Data

Features

Models

Stacking

Cross-Validation

Further Make commands

Random Feature Sets

shap_plots

End Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages