Here is a brief overview of library with links to the detailed descriptions.
Library modules:
-
ptls.preprocessing
- transforms data toptls
-compatible format withpandas
orpyspark
. Categorical encoding, datetime transformation, numerical feature preprocessing. -
ptls.data_load
- all that you need for prepare your data to training and validation.ptls.data_load.datasets
- PyTorchDataset
API implementation for data access.ptls.data_load.iterable_processing
- generator-style filters for data transformation.ptls.data_load.augmentations
- functions for data augmentation.
-
ptls.frames
- tools for training encoders with popular frameworks like CoLES, SimCLR, CPC, VICReg, ...ptls.frames.coles
- Contrastive leaning on sub-sequences.ptls.frames.cpc
- Contrastive learning for future event state prediction.ptls.frames.bert
- methods, inspired by NLP and transformer models.ptls.framed.supervised
- modules fo supervised training.ptls.frames.inference
- inference module.
-
ptls.nn
- layers for model creation:ptls.nn.trx_encoder
- layers to produce the representation for a single transactions.ptls.nn.seq_encoder
- layers for sequence processing, likeRNN
ofTransformer
.ptls.nn.pb
-PaddedBatch
compatible layers, similar totorch.nn
modules, but works withptls-data
.ptls.nn.head
- composite layers for final embedding transformation.ptls.nn.seq_step.py
- change the sequence along the time axis.ptls.nn.binarization
,ptls.nn.normalization
- other groups of layers.
- Prepare your data.
- Use
Pyspark
in local or cluster mode for big dataset andPandas
for small. - Split data into required parts (train, valid, test, ...).
- Use
ptls.preprocessing
for simple data preparation. - Transform features to compatible format using
Pyspark
orPandas
functions. You can also useptls.data_load.preprocessing
for common data transformation patterns. - Split sequences to
ptls-data
format withptls.data_load.split_tools
. Save prepared data intoParquet
format or keep it in memory (Pickle
also works). - Use one of the available
ptls.data_load.datasets
to define input for the models.
- Use
- Choose framework for encoder train.
- There are both supervised of unsupervised frameworks in
ptls.frames
. - Keep in mind that each framework requires its own batch format. Tools for batch collate can be found in the selected framework package.
- There are both supervised of unsupervised frameworks in
- Build encoder.
- All parts are available in
ptls.nn
. - You can also use pretrained layers.
- All parts are available in
- Train your encoder with selected framework and
pytorch_lightning
.- Provide data with one of the DataLoaders that is compatible with selected framework.
- Monitor the progress on tensorboard.
- Optionally tune hyperparameters.
- Save trained encoder for future use.
- You can use it as single solution (e.g. get class label probabilities).
- Or it can be a pretrained part of other neural network.
- Use encoder in your project.
- Run predict for your data and get logits, probas, scores or embeddings.
- Use
ptls.data_load
andptls.data_load.datasets
tools to keep your data transformation and collect batches for inference.
It is possible create specific component for every library modules. Here are the links to the detailed description: