KDD Cup 98 Challenge

Task

For a direct mailing campaign organised by a non-profit organisation, build statistical models that:

Identify the recipients that will engage with the campaign.
Maximise the campaign’s revenue.

My solutions to these tasks are in scripts donors.py and profits.py, respectively. The technical report is at report.html. All the code and the report are available in a github repository.

Dataset

This dataset was used in the KDD Cup 98 Challenge. It was collected by a non-profit organisation that helps US Veterans. They raise money via direct mailing campaigns.

See the documentation and the data dictionary for more information.

The profits when targeting the entire testset are $10,560. The cost of sending each mail is $0.68.

Size

191779 records: 95412 training cases and 96367 test cases
481 attributes
236.2 MB: 117.2 MB training data and 119 MB test data

My Solutions

They are structured around the following steps:

Data Importation
Exploratory Analysis
Data Munging
Feature Selection
Model Selection
Training
Testing
Model Evaluation and Comparison

In my solution to task 1 I follow this procedure.

In my solution to task 2 first I predict who is a donor, and then - using just those samples - I train a classifier that predicts how much the person donated. Then I mail all the ones where the prediction is higher than $0.68.

I used only the training cases that were provided and made my training and test sets out of that file. Thus my train and test sets together have 95412 cases.

System Architecture

.
├── README.md
├── config.yml
├── data
│   ├── cup98LRN.csv
│   └── cup98lrn.zip
├── donors.py
├── lib
│   ├── __init__.py
│   ├── analyser.py
│   ├── importer.py
│   ├── preprocessor.py
│   ├── utils.py
├── profits.py
├── report.html
└── report.md

The main files are donors.py and profits.py. The project’s configuration is at config.yml and all the auxiliary classes and their methods are in lib.

Author

Antonio Rebordao 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KDD Cup 98 Challenge

Task

Dataset

Size

My Solutions

System Architecture

Author

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
lib		lib
.gitattributes		.gitattributes
.gitignore		.gitignore
README.html		README.html
README.md		README.md
config.yml		config.yml
donors.py		donors.py
profits.py		profits.py
report.html		report.html
report.md		report.md

nabilEM/kdd98cup

Folders and files

Latest commit

History

Repository files navigation

KDD Cup 98 Challenge

Task

Dataset

Size

My Solutions

System Architecture

Author

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages