pytorch-lifestream
or ptls a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-size volumes of raw events like game history events, clickstream data, purchase history or card transactions.
It supports various methods of self-supervised training, adapted for event sequences:
- Contrastive Learning for Event Sequences (CoLES)
- Contrastive Predictive Coding (CPC)
- Replaced Token Detection (RTD) from ELECTRA
- Next Sequence Prediction (NSP) from BERT
- Sequences Order Prediction (SOP) from ALBERT
- Masked Language Model (MLM) from ROBERTA
It supports several types of encoders, including Transformer and RNN. It also supports many types of self-supervised losses.
The following variants of the contrastive losses are supported:
- Contrastive loss (paper)
- Triplet loss (paper)
- Binomial deviance loss (paper)
- Histogramm loss (paper)
- Margin loss (paper)
- VICReg loss (paper)
pip install pytorch-lifestream
# Ubuntu 20.04
sudo apt install python3.8 python3-venv
pip3 install pipenv
pipenv sync --dev # install packages exactly as specified in Pipfile.lock
pipenv shell
pytest
We have a demo notebooks here, some of them:
- Supervised model training notebook
- Self-supervided training and embeddings for downstream task notebook
- Self-supervided embeddings in CatBoost notebook
- Self-supervided training and fine-tuning notebook
- Self-supervised TrxEncoder only training with Masked Language Model task and fine-tuning notebook
- Pandas data preprocessing options notebook
- PySpark and Parquet for data preprocessing notebook
- Fast inference on large dataset notebook
- Supervised multilabel classification notebook
- CoLES multimodal notebook
And we have a tutorials here
Library description index
pytorch-lifestream
usage experiments on several public event datasets are available in the separate repo
- Data Fusion Contest 2022 report (in Russian)
- Data Fusion Contest 2022 report, Sber AI Lab team (in Russian)
- VK.com Graph ML Hackaton report (in Russian)
- VK.com Graph ML Hackaton report, AlfaBank team (in Russian)
- American Express - Default Prediction Kaggle contest report (in Russian)
- Data Fusion Contest 2024, Sber AI Lab team
- Data Fusion Contest 2024, Ivan Alexandrov
- American Express - Default Prediction
- COTIC -
pytorch-lifestream
is used in experiment for Continuous-time convolutions model of event sequences
- Make your chages via Fork and Pull request.
- Write unit test for new code in
ptls_tests
. - Check unit test via
pytest
: Example.
We have a paper you can cite it:
@inproceedings{
Babaev_2022, series={SIGMOD/PODS ’22},
title={CoLES: Contrastive Learning for Event Sequences with Self-Supervision},
url={http://dx.doi.org/10.1145/3514221.3526129},
DOI={10.1145/3514221.3526129},
booktitle={Proceedings of the 2022 International Conference on Management of Data},
publisher={ACM},
author={Babaev, Dmitrii and Ovsov, Nikita and Kireev, Ivan and Ivanova, Maria and Gusev, Gleb and Nazarov, Ivan and Tuzhilin, Alexander},
year={2022},
month=jun, collection={SIGMOD/PODS ’22}
}