Skip to content
/ LogPar Public

LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values (KDD '20)

License

Notifications You must be signed in to change notification settings

jakeykj/LogPar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values (KDD '20)

This repo contains the PyTorch implementation of the paper LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values in KDD 2020. [paper] [poster presentation]

Overview: LogPar assumes that the binary observations are generated from an underlying real-valued irregular tensor via a quantization process followed by random sampling. LogPar factorizes the underlying irregular tensors, instead of the binary observations, using the PARAFAC2 model.

Citation

If you find the paper or the implementation helpful, please cite the following paper:

@inproceedings{yin2020logpar,
    author = {Yin, Kejing and Afshar, Ardavan and Ho, Joyce C. and Cheung, William K. and Zhang, Chao and Sun, Jimeng},
    title = {LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values},
    year = {2020},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    doi = {10.1145/3394486.3403213},
    booktitle = {Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
    pages = {1625–1635},
    location = {Virtual Event, CA, USA},
    series = {KDD '20}
}

Requirements

The codes have been tested with the following packages:

  • Python 3.7
  • PyTorch 1.3

Quick Demo

To run the model with a quick demo data, simply clone the repo and decompress the data archive by executing the following commands:

git clone git@github.com:jakeykj/LogPar.git
cd LogPar
tar -xzvf demo_data.tar.gz
python main.py

A folder ./results/ will be automatically created and the results will be saved there.

Use python main.py --help to obtain more information about setting the parameters of the model.

Data Format and Organization

The data are organized in a Python List object and is saved using the Python built-in pickle module. Each element in the list corresponds to a patient (one slice of the binary irregular tensor), and is a dictionary object, with the following elements:

  • pid: The unique identifier of the patient.
  • train: The subset for training containing only positive observations. It is a N-by-3 matrix-like "list of list" object. each "row" represents a positive entry in the slice of the irregular tensor corresponding to this patient. The first column is the visit id starting from zero, the second column is the feature id starting from zero, and the third column is the value at that entry. Note that during training phase, all entries except the observed positive ones are regarded as zero, so it is not neccessary to specify the zero entries in the train subset.
  • validation: The subset for validation. It follows the same structure as the train subset, with the only difference that some negative entries are required for evaluation.
  • test: The subset for test. It has the same sturcture as the validation subset.
  • times: A Python List object containing the time stamps of the hospital visits of the patient.
  • deltas: A Python List object containing the time gaps between hospital visits (in days).
  • label: The label of the predictive task for the patient. If no label information is available, set it to None.

If you use other datasets, you can organize the input data in the same format described above, and pass the <DATA_PATH> as a parameter to the training script:

python main.py --data_path <DATA_PATH>

Contact

If you have any enquires, please contact Mr. Kejing Yin by email: cskjyin [AT] comp [DOT] hkbu.edu.hk.


👉 Check out my home page for more research work by us.

About

LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values (KDD '20)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages