For a direct mailing campaign organised by a non-profit organisation, build statistical models that:
- Identify the recipients that will engage with the campaign.
- Maximise the campaign’s revenue.
My solutions to these tasks are in scripts donors.py
and profits.py
,
respectively. The technical report is at report.html
. All the code and the
report are available in a github repository.
This dataset was used in the KDD Cup 98 Challenge. It was collected by a non-profit organisation that helps US Veterans. They raise money via direct mailing campaigns.
See the documentation and the data dictionary for more information.
The profits when targeting the entire testset are $10,560. The cost of sending each mail is $0.68.
- 191779 records: 95412 training cases and 96367 test cases
- 481 attributes
- 236.2 MB: 117.2 MB training data and 119 MB test data
They are structured around the following steps:
- Data Importation
- Exploratory Analysis
- Data Munging
- Feature Selection
- Model Selection
- Training
- Testing
- Model Evaluation and Comparison
In my solution to task 1 I follow this procedure.
In my solution to task 2 first I predict who is a donor, and then - using just those samples - I train a classifier that predicts how much the person donated. Then I mail all the ones where the prediction is higher than $0.68.
I used only the training cases that were provided and made my training and test sets out of that file. Thus my train and test sets together have 95412 cases.
.
├── README.md
├── config.yml
├── data
│ ├── cup98LRN.csv
│ └── cup98lrn.zip
├── donors.py
├── lib
│ ├── __init__.py
│ ├── analyser.py
│ ├── importer.py
│ ├── preprocessor.py
│ ├── utils.py
├── profits.py
├── report.html
└── report.md
The main files are donors.py
and profits.py
. The project’s configuration
is at config.yml
and all the auxiliary classes and their methods are in lib
.
Antonio Rebordao 2015