GitHub - dllllb/pytorch-lifestream: A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

pytorch-lifestream or ptls a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-size volumes of raw events like game history events, clickstream data, purchase history or card transactions.

It supports various methods of self-supervised training, adapted for event sequences:

Contrastive Learning for Event Sequences (CoLES)
Contrastive Predictive Coding (CPC)
Replaced Token Detection (RTD) from ELECTRA
Next Sequence Prediction (NSP) from BERT
Sequences Order Prediction (SOP) from ALBERT
Masked Language Model (MLM) from ROBERTA

It supports several types of encoders, including Transformer and RNN. It also supports many types of self-supervised losses.

The following variants of the contrastive losses are supported:

Contrastive loss (paper)
Triplet loss (paper)
Binomial deviance loss (paper)
Histogramm loss (paper)
Margin loss (paper)
VICReg loss (paper)

Install from PyPi

pip install pytorch-lifestream

Install from source

# Ubuntu 20.04

sudo apt install python3.8 python3-venv
pip3 install pipenv

pipenv sync --dev # install packages exactly as specified in Pipfile.lock
pipenv shell
pytest

Demo notebooks

We have a demo notebooks here, some of them:

Supervised model training notebook
Self-supervided training and embeddings for downstream task notebook
Self-supervided embeddings in CatBoost notebook
Self-supervided training and fine-tuning notebook
Self-supervised TrxEncoder only training with Masked Language Model task and fine-tuning notebook
Pandas data preprocessing options notebook
PySpark and Parquet for data preprocessing notebook
Fast inference on large dataset notebook
Supervised multilabel classification notebook
CoLES multimodal notebook

And we have a tutorials here

Docs

Documentation

Library description index

Experiments on public datasets

pytorch-lifestream usage experiments on several public event datasets are available in the separate repo

PyTorch-LifeStream in ML Competitions

Data Fusion Contest 2022 report (in Russian)
Data Fusion Contest 2022 report, Sber AI Lab team (in Russian)
VK.com Graph ML Hackaton report (in Russian)
VK.com Graph ML Hackaton report, AlfaBank team (in Russian)
American Express - Default Prediction Kaggle contest report (in Russian)
Data Fusion Contest 2024, Sber AI Lab team
Data Fusion Contest 2024, Ivan Alexandrov
American Express - Default Prediction
COTIC - pytorch-lifestream is used in experiment for Continuous-time convolutions model of event sequences

How to contribute

Make your chages via Fork and Pull request.
Write unit test for new code in ptls_tests.
Check unit test via pytest: Example.

Citation

We have a paper you can cite it:

@inproceedings{
   Babaev_2022, series={SIGMOD/PODS ’22},
   title={CoLES: Contrastive Learning for Event Sequences with Self-Supervision},
   url={http://dx.doi.org/10.1145/3514221.3526129},
   DOI={10.1145/3514221.3526129},
   booktitle={Proceedings of the 2022 International Conference on Management of Data},
   publisher={ACM},
   author={Babaev, Dmitrii and Ovsov, Nikita and Kireev, Ivan and Ivanova, Maria and Gusev, Gleb and Nazarov, Ivan and Tuzhilin, Alexander},
   year={2022},
   month=jun, collection={SIGMOD/PODS ’22}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,383 Commits
.github/workflows		.github/workflows
docs		docs
ptls		ptls
ptls_tests		ptls_tests
tutorials		tutorials
.coveragerc		.coveragerc
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
DockerfilePaper		DockerfilePaper
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
mkdocs.yml		mkdocs.yml
ptls-banner.png		ptls-banner.png
pylintrc		pylintrc
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install from PyPi

Install from source

Demo notebooks

Docs

Experiments on public datasets

PyTorch-LifeStream in ML Competitions

How to contribute

Citation

About

Releases 7

Packages

Contributors 28

Languages

License

dllllb/pytorch-lifestream

Folders and files

Latest commit

History

Repository files navigation

Install from PyPi

Install from source

Demo notebooks

Docs

Experiments on public datasets

PyTorch-LifeStream in ML Competitions

How to contribute

Citation

About

Resources

License

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 28

Languages

Packages