Skip to content

Latest commit

 

History

History
493 lines (425 loc) · 25.9 KB

CHANGELOG.rst

File metadata and controls

493 lines (425 loc) · 25.9 KB

Change Log

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning

This release is only compatible with PyTorch 1.10+.

New Models

Updated Models

  • Update R-GCN configuration by @mberr in pykeen#610
  • Update ConvKB to ERModel by @cthoyt in pykeen#425
  • Update ComplEx to ERModel by @mberr in pykeen#639
  • Rename TranslationalInteraction to NormBasedInteraction by @mberr in pykeen#651
  • Fix generic slicing dimension by @mberr in pykeen#683
  • Rename UnstructuredModel to UM and StructuredEmbedding to SE by @cthoyt in pykeen#721
  • Allow to pass unresolved loss to ERModel's __init__ by @mberr in pykeen#717

Representations and Initialization

  • Add low-rank embeddings by @mberr in pykeen#680
  • Add NodePiece representation by @mberr in pykeen#621
  • Add label-based initialization using a transformer (e.g., BERT) by @mberr in pykeen#638 and pykeen#652
  • Add label-based representation (e.g., to update language model using KGEM) by @mberr in pykeen#652
  • Remove literal representations (use label-based initialization instead) by @mberr in pykeen#679

Training

  • Fix displaying previous epoch's loss by @mberr in pykeen#627
  • Fix kwargs transmission on MultiTrainingCallback by @Rodrigo-A-Pereira in pykeen#645
  • Extend Callbacks by @mberr in pykeen#609
  • Add gradient clipping by @mberr in pykeen#607
  • Fix negative score shape for sLCWA by @mberr in pykeen#624
  • Fix epoch loss for loss reduction != "mean" by @mberr in pykeen#623
  • Add sLCWA support for Cross Entropy Loss by @mberr in pykeen#704

Inference

  • Add uncertainty estimate functions via MC dropout by @mberr in pykeen#688
  • Fix predict top k by @mberr in pykeen#690
  • Fix indexing in predict_* methods when using inverse relations by @mberr in pykeen#699
  • Move tensors to device for predict_* methods by @mberr in pykeen#658

Trackers

Evaluation

  • Store rank count by @mberr in pykeen#672
  • Extend evaluate() for easier relation filtering by @mberr in pykeen#391
  • Rename sklearn evaluator and refactor evaluator code by @cthoyt in pykeen#708
  • Add additional classification metrics via rexmex by @cthoyt in pykeen#668

Triples and Datasets

  • Add helper dataset with internal batching for Schlichtkrull sampling by @mberr in pykeen#616
  • Refactor splitting code and improve documentation by @mberr in pykeen#709
  • Switch np.loadtxt to pandas.read_csv by @mberr in pykeen#695
  • Add binary I/O to triples factories @cthoyt in pykeen#665

Torch Usage

  • Use torch.finfo to determine suitable epsilon values by @mberr in pykeen#626
  • Use torch.isin instead of own implementation by @mberr in pykeen#635
  • Switch to using torch.inference_mode instead of torch.no_grad by @sbonner0 in pykeen#604

Miscellaneous

  • Add YAML experiment format by @mberr in pykeen#612
  • Add comparison with reproduction results during replication, if available by @mberr in pykeen#642
  • Adapt hello_world notebook to API changes by @dobraczka in pykeen#649
  • Add testing configuration for Jupyter notebooks by @mberr in pykeen#650
  • Add empty default loss_kwargs by @mali-git in pykeen#656
  • Optional extra config for reproduce by @mberr in pykeen#692
  • Store pipeline configuration in pipeline result by @mberr in pykeen#685
  • Fix upgrade to sequence by @mberr in pykeen#697
  • Fix pruner use in hpo_pipeline by @mberr in pykeen#724

Housekeeping

  • Automatically lint with black by @cthoyt in pykeen#605
  • Documentation and style guide cleanup by @cthoyt in pykeen#606

This release is only compatible with PyTorch 1.9+. Because of some changes, it's now pretty non-trivial to support both, so moving forwards PyKEEN will continue to support the latest version of PyTorch and try its best to keep backwards compatibility.

New Models

New Datasets

New Losses

Added

  • Tutorial in using checkpoints when bringing your own data (pykeen#498)
  • Learning rate scheduling (pykeen#492)
  • Checkpoints include entity/relation maps (pykeen#498)
  • QuatE reproducibility configurations (pykeen#486)

Changed

Fixed

  • FileNotFoundError on Windows/Anaconda (pykeen#503, thanks @Hao-666)
  • Fixed docstring for ComplEx interaction (pykeen#504)
  • Make DistMult the default interaction function for R-GCN (pykeen#548)
  • Fix gradient error in CompGCN buffering (pykeen#573)
  • Fix splitting of numeric triples factories (pykeen#594, thanks @Rodrigo-A-Pereira)
  • Fix determinism in spitting of triples factory (pykeen#500)
  • Fix documentation and improve HPO suggestion (pykeen#524, thanks @kdutia)

1.5.0 - 2021-06-13

New Metrics

  • Adjusted Arithmetic Mean Rank Index (pykeen#378)
  • Add harmonic, geometric, and median rankings (pykeen#381)

New Trackers

New Models

New Negative Samplers

Datasets

Added

Updated

  • R-GCN implementation now uses new-style models and is super idiomatic (pykeen#110)
  • Enable passing of interaction function by string in base model class (pykeen#384, pykeen#387)
  • Bump scipy requirement to 1.5.0+
  • Updated interfaces of models and negative samplers to enforce kwargs (pykeen#445)
  • Reorganize filtering, negative sampling, and remove triples factory from most objects ( pykeen#400, pykeen#405, pykeen#406, pykeen#409, pykeen#420)
  • Update automatic memory optimization (pykeen#404)
  • Flexibly define positive triples for filtering (pykeen#398)
  • Completely reimplemented negative sampling interface in training loops (pykeen#427)
  • Completely reimplemented loss function in training loops (pykeen#448)
  • Forward-compatibility of embeddings in old-style models and updated docs on how to use embeddings (pykeen#474)

Fixed

  • Regularizer passing in the pipeline and HPO (pykeen#345)
  • Saving results when using multimodal models (pykeen#349)
  • Add missing diagonal constraint on MuRE Model (pykeen#353)
  • Fix early stopper handling (pykeen#419)
  • Fixed saving results from pipeline (pykeen#428, thanks @kantholtz)
  • Fix OOM issues with early stopper and AMO (pykeen#433)
  • Fix ER-MLP functional form (pykeen#444)

1.4.0 - 2021-03-04

New Datasets

New Models

New Algorithms

If you're interested in any of these, please get in touch with us regarding an upcoming publication.

Added

  • New-style models (pykeen#260) for direct usage of interaction modules
  • Ability to train pipeline() using an Interaction module rather than a Model (pykeen#326, pykeen#330).

Changes

  • Lookup of assets is now mediated by the class_resolver package (pykeen#321, pykeen#327)
  • The docdata package is now used to parse structured information out of the model and dataset documentation in order to make a more informative README with links to citations (pykeen#303).

1.3.0 - 2021-02-15

We skipped version 1.2.0 because we made an accidental release before this version was ready. We're only human, and are looking into improving our release workflow to live in CI/CD so something like this doesn't happen again. However, as an end user, this won't have an effect on you.

New Datasets

New Trackers

Fixed

  • Fixed ComplEx's implementation (pykeen#313)
  • Fixed OGB's reuse entity identifiers (pykeen#318, thanks @tgebhart)

Added

  • pykeen version command for more easily reporting your environment in issues (pykeen#251)
  • Functional forms of all interaction models (e.g., TransE, RotatE) (pykeen#238, pykeen.nn.functional documentation). These can be generally reused, even outside of the typical PyKEEN workflows.
  • Modular forms of all interaction models (pykeen#242, pykeen.nn.modules documentation). These wrap the functional forms of interaction models and store hyper-parameters such as the p value for the L_p norm in TransE.
  • The initializer, normalizer, and constrainer for the entity and relation embeddings are now exposed through the __init__() function of each KGEM class and can be configured. A future update will enable HPO on these as well (pykeen#282).

Refactoring and Future Preparation

This release contains a few big refactors. Most won't affect end-users, but if you're writing your own PyKEEN models, these are important. Many of them are motivated to make it possible to introduce a new interface that makes it much easier for researchers (who shouldn't have to understand the inner workings of PyKEEN) to make new models.

  • The regularizer has been refactored (pykeen#266, pykeen#274). It no longer accepts a torch.device when instantiated.
  • The pykeen.nn.Embedding class has been improved in several ways: - Embedding Specification class makes it easier to write new classes (pykeen#277) - Refactor to make shape of embedding explicit (pykeen#287) - Specification of complex datatype (pykeen#292)
  • Refactoring of the loss model class to provide a meaningful class hierarchy (pykeen#256, pykeen#262)
  • Refactoring of the base model class to provide a consistent interface (pykeen#246, pykeen#248, pykeen#253, pykeen#257). This allowed for simplification of the loss computation based on the new hierarchy and also new implementation of regularizer class.
  • More automated testing of typing with MyPy (pykeen#255) and automated checking of documentation with doctests (pykeen#291)

Triples Loading

We've made some improvements to the pykeen.triples.TriplesFactory to facilitate loading even larger datasets (pykeen#216). However, this required an interface change. This will affect any code that loads custom triples. If you're loading triples from a path, you should now use:

path = ...

# Old (doesn't work anymore)
tf = TriplesFactory(path=path)

# New
tf = TriplesFactory.from_path(path)

Predictions

While refactoring the base model class, we excised the prediction functionality to a new module pykeen.models.predict (docs: https://pykeen.readthedocs.io/en/latest/reference/predict.html#functions). We also renamed some of the prediction functions inside the base model to make them more consistent, but we now recommend you use the functions from pykeen.models.predict instead.

  • Model.predict_heads() -> Model.get_head_prediction_df()
  • Model.predict_relations() -> Model.get_head_prediction_df()
  • Model.predict_tails() -> Model.get_head_prediction_df()
  • Model.score_all_triples() -> Model.get_all_prediction_df()

Fixed

  • Do not create inverse triples for validation and testing factory (pykeen#270)
  • Treat nonzero applied to large tensor error as OOM for batch size search (pykeen#279)
  • Fix bug in loading ConceptNet (pykeen#290). If your experiments relied on this dataset, you should rerun them.

1.1.0 - 2021-01-20

New Datasets

New Trackers

Added

  • Add MLFlow set tags function (pykeen#139; thanks @sunny1401)
  • Add score_t/h function for ComplEx (pykeen#150)
  • Add proper testing for literal datasets and literal models (pykeen#199)
  • Checkpoint functionality (pykeen#123)
  • Random triple generation (pykeen#201)
  • Make negative sampler corruption scheme configurable (pykeen#209)
  • Add predict with inverse tripels pipeline (pykeen#208)
  • Add generalize p-norm to regularizer (pykeen#225)

Changed

  • New harness for resetting parameters (pykeen#131)
  • Modularize embeddings (pykeen#132)
  • Update first steps documentation (pykeen#152; thanks @TobiasUhmann )
  • Switched testing to GitHub Actions (pykeen#165 and pykeen#194)
  • No longer support Python 3.6
  • Move automatic memory optimization (AMO) option out of model and into training loop (pykeen#176)
  • Improve hyper-parameter defaults and HPO defaults (pykeen#181 and pykeen#179)
  • Switch internal usage to ID-based triples (pykeen#193 and pykeen#220)
  • Optimize triples splitting algorithm (pykeen#187)
  • Generalize metadata storage in triples factory (pykeen#211)
  • Add drop_last option to data loader in training loop (pykeen#217)

Fixed

1.0.5 - 2020-10-21

Added

  • Added testing on Windows with AppVeyor and documentation for installation on Windows (pykeen#95)
  • Add ability to specify custom datasets in HPO and ablation studies (pykeen#54)
  • Add functions for plotting entities and relations (as well as an accompanying tutorial) (pykeen#99)

Changed

  • Replaced BCE loss with BCEWithLogits loss (pykeen#109)
  • Store default HPO ranges in loss classes (pykeen#111)
  • Use entrypoints for datasets (pykeen#115) to allow registering of custom datasets
  • Improved WANDB results tracker (pykeen#117, thanks @kantholtz)
  • Reorganized ablation study generation and execution (pykeen#54)

Fixed

  • Fixed bug in the initialization of ConvE (pykeen#100)
  • Fixed cross-platform issue with random integer generation (pykeen#98)
  • Fixed documentation build on ReadTheDocs (pykeen#104)

1.0.4 - 2020-08-25

Added

Changed

  • Use number of epochs as step instead of number of checks (pykeen#72)

Fixed

1.0.3 - 2020-08-13

Added

Changed

Fixed

1.0.2 - 2020-07-10

Added

  • Add default values for margin and adversarial temperature in NSSA loss (pykeen#29)
  • Added FTP uploader (pykeen#35)
  • Add AWS S3 uploader (pykeen#39)

Changed

  • Improved MLflow support (pykeen#40)
  • Lots of improvements to documentation!

Fixed

  • Fix triples factory splitting bug (pykeen#21)
  • Fix problem with tensors' device during prediction (pykeen#41)
  • Fix RotatE relation embeddings re-initialization (pykeen#26)

1.0.1 - 2020-07-02

Added

Changed