DeepAttribution is an AWS (sagemaker) ML pipeline that allows marketing data scientists and ML engineers to compute multi touch-attribution results using the state of the art technique without effort.
-
A big dataset (> 1GB) at impression level (impressions dataset) as parquet file with the following schema:
- uid: the user/client unique identifier
- timestamp: unix timestamp
- campaign: the campaign name
- conversion: whether or not a conversion happened after the impression
uid | timestamp | campaign | conversion
_______________________________________
int | int | str | bool
- S3 bucket (deep-attribution bucket) with the following folder hierarchy
.
├── raw # contains impressions dataset
├── feature_store # empty before pipeline execution
├── feature_store_preprocessed # empty before pipeline execution
├── model # empty before pipeline execution
└── attention_report # empty before pipeline execution
- Sagemaker notebook instance with git repo set to this repo (deep-attribution bucket).
- AWS role with sagemaker execution permission and read/write permissions on the s3 bucket mentionned above.
- Basic knowledge of LSTM and familiarity with the multi touch attribution model presented in Deep Neural Net with Attention for Multi-channel Multi-touch Attribution
- Understanding of what is a journey, the model and the pipeline
- Define the journey maximum length. Please refer to this doc to define it.
- Update the config file (config.yaml) with the desired instance type and count, bucket name and the journey maximum length.
- In the deep-attribution instance open the pipeline execution notebook (deep_attribution/pipeline_exec.ipynb)
- Run all the cells
- Get the attribution results in the deep-attribution bucket (attention_report/campaign_attention.parquet)
If you want to contact me you can reach me at leopoldavezac@gmail.com.
This project uses the following license: MIT.