All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning
This release is only compatible with PyTorch 1.10+.
- Add BoxE by @ralphabb in pykeen#618
- Add TripleRE by @mberr in pykeen#712
- Add AutoSF by @mberr in pykeen#713
- Add Transformer by @mberr in pykeen#714
- Add Canonical Tensor Decomposition by @mberr in pykeen#663
- Add (novel) Fixed Model by @cthoyt in pykeen#691
- Add NodePiece model by @mberr in pykeen#621
- Update R-GCN configuration by @mberr in pykeen#610
- Update ConvKB to ERModel by @cthoyt in pykeen#425
- Update ComplEx to ERModel by @mberr in pykeen#639
- Rename TranslationalInteraction to NormBasedInteraction by @mberr in pykeen#651
- Fix generic slicing dimension by @mberr in pykeen#683
- Rename UnstructuredModel to UM and StructuredEmbedding to SE by @cthoyt in pykeen#721
- Allow to pass unresolved loss to ERModel's __init__ by @mberr in pykeen#717
- Add low-rank embeddings by @mberr in pykeen#680
- Add NodePiece representation by @mberr in pykeen#621
- Add label-based initialization using a transformer (e.g., BERT) by @mberr in pykeen#638 and pykeen#652
- Add label-based representation (e.g., to update language model using KGEM) by @mberr in pykeen#652
- Remove literal representations (use label-based initialization instead) by @mberr in pykeen#679
- Fix displaying previous epoch's loss by @mberr in pykeen#627
- Fix kwargs transmission on MultiTrainingCallback by @Rodrigo-A-Pereira in pykeen#645
- Extend Callbacks by @mberr in pykeen#609
- Add gradient clipping by @mberr in pykeen#607
- Fix negative score shape for sLCWA by @mberr in pykeen#624
- Fix epoch loss for loss reduction != "mean" by @mberr in pykeen#623
- Add sLCWA support for Cross Entropy Loss by @mberr in pykeen#704
- Add uncertainty estimate functions via MC dropout by @mberr in pykeen#688
- Fix predict top k by @mberr in pykeen#690
- Fix indexing in predict_* methods when using inverse relations by @mberr in pykeen#699
- Move tensors to device for predict_* methods by @mberr in pykeen#658
- Fix wandb logging by @mberr in pykeen#647
- Add multi-result tracker by @mberr in pykeen#682
- Add Python result tracker by @mberr in pykeen#681
- Update file trackers by @cthoyt in pykeen#629
- Store rank count by @mberr in pykeen#672
- Extend evaluate() for easier relation filtering by @mberr in pykeen#391
- Rename sklearn evaluator and refactor evaluator code by @cthoyt in pykeen#708
- Add additional classification metrics via rexmex by @cthoyt in pykeen#668
- Add helper dataset with internal batching for Schlichtkrull sampling by @mberr in pykeen#616
- Refactor splitting code and improve documentation by @mberr in pykeen#709
- Switch np.loadtxt to pandas.read_csv by @mberr in pykeen#695
- Add binary I/O to triples factories @cthoyt in pykeen#665
- Use torch.finfo to determine suitable epsilon values by @mberr in pykeen#626
- Use torch.isin instead of own implementation by @mberr in pykeen#635
- Switch to using torch.inference_mode instead of torch.no_grad by @sbonner0 in pykeen#604
- Add YAML experiment format by @mberr in pykeen#612
- Add comparison with reproduction results during replication, if available by @mberr in pykeen#642
- Adapt hello_world notebook to API changes by @dobraczka in pykeen#649
- Add testing configuration for Jupyter notebooks by @mberr in pykeen#650
- Add empty default loss_kwargs by @mali-git in pykeen#656
- Optional extra config for reproduce by @mberr in pykeen#692
- Store pipeline configuration in pipeline result by @mberr in pykeen#685
- Fix upgrade to sequence by @mberr in pykeen#697
- Fix pruner use in hpo_pipeline by @mberr in pykeen#724
- Automatically lint with black by @cthoyt in pykeen#605
- Documentation and style guide cleanup by @cthoyt in pykeen#606
This release is only compatible with PyTorch 1.9+. Because of some changes, it's now pretty non-trivial to support both, so moving forwards PyKEEN will continue to support the latest version of PyTorch and try its best to keep backwards compatibility.
- DistMA (pykeen#507)
- TorusE (pykeen#510)
- Frequency Baselines (pykeen#514)
- Gated Distmult Literal (pykeen#591, thanks @Rodrigo-A-Pereira)
- WD50K (pykeen#511)
- Wikidata5M (pykeen#528)
- BioKG (pykeen#585, thanks @sbonner0)
- Double Margin Loss (pykeen#539)
- Focal Loss (pykeen#542)
- Pointwise Hinge Loss (pykeen#540)
- Soft Pointwise Hinge Loss (pykeen#540)
- Pairwise Logistic Loss (pykeen#540)
- Tutorial in using checkpoints when bringing your own data (pykeen#498)
- Learning rate scheduling (pykeen#492)
- Checkpoints include entity/relation maps (pykeen#498)
- QuatE reproducibility configurations (pykeen#486)
- Reimplment SE (pykeen#521) and NTN (pykeen#522) with new-style models
- Generalize pairwise loss and pointwise loss hierarchies (pykeen#540)
- Update to use PyTorch 1.9 functionality (pykeen#489)
- Generalize generator strategies in LCWA (pykeen#602)
- FileNotFoundError on Windows/Anaconda (pykeen#503, thanks @Hao-666)
- Fixed docstring for ComplEx interaction (pykeen#504)
- Make DistMult the default interaction function for R-GCN (pykeen#548)
- Fix gradient error in CompGCN buffering (pykeen#573)
- Fix splitting of numeric triples factories (pykeen#594, thanks @Rodrigo-A-Pereira)
- Fix determinism in spitting of triples factory (pykeen#500)
- Fix documentation and improve HPO suggestion (pykeen#524, thanks @kdutia)
1.5.0 - 2021-06-13
- Adjusted Arithmetic Mean Rank Index (pykeen#378)
- Add harmonic, geometric, and median rankings (pykeen#381)
- Console Tracker (pykeen#440)
- Tensorboard Tracker (pykeen#416; thanks @sbonner0)
- QuatE (pykeen#367)
- CompGCN (pykeen#382)
- CrossE (pykeen#467)
- Reimplementation of LiteralE with arbitrary combination (g) function (pykeen#245)
- Pseudo-typed Negative Sampler (pykeen#412)
- Removed invalid datasets (OpenBioLink filtered sets; pykeen#439)
- Added WK3k-15K (pykeen#403)
- Added WK3l-120K (pykeen#403)
- Added CN3l (pykeen#403)
- Documentation on using PyKEEN in Google Colab and Kaggle (pykeen#379, thanks @jerryIsHere)
- Pass custom training loops to pipeline (pykeen#334)
- Compatibility later for the fft module (pykeen#288)
- Official Python 3.9 support, now that PyTorch has it (pykeen#223)
- Utilities for dataset analysis (pykeen#16, pykeen#392)
- Filtering of negative sampling now uses a bloom filter by default (pykeen#401)
- Optional embedding dropout (pykeen#422)
- Added more HPO suggestion methods and docs (pykeen#446)
- Training callbacks (pykeen#429)
- Class resolver for datasets (pykeen#473)
- R-GCN implementation now uses new-style models and is super idiomatic (pykeen#110)
- Enable passing of interaction function by string in base model class (pykeen#384, pykeen#387)
- Bump scipy requirement to 1.5.0+
- Updated interfaces of models and negative samplers to enforce kwargs (pykeen#445)
- Reorganize filtering, negative sampling, and remove triples factory from most objects ( pykeen#400, pykeen#405, pykeen#406, pykeen#409, pykeen#420)
- Update automatic memory optimization (pykeen#404)
- Flexibly define positive triples for filtering (pykeen#398)
- Completely reimplemented negative sampling interface in training loops (pykeen#427)
- Completely reimplemented loss function in training loops (pykeen#448)
- Forward-compatibility of embeddings in old-style models and updated docs on how to use embeddings (pykeen#474)
- Regularizer passing in the pipeline and HPO (pykeen#345)
- Saving results when using multimodal models (pykeen#349)
- Add missing diagonal constraint on MuRE Model (pykeen#353)
- Fix early stopper handling (pykeen#419)
- Fixed saving results from pipeline (pykeen#428, thanks @kantholtz)
- Fix OOM issues with early stopper and AMO (pykeen#433)
- Fix ER-MLP functional form (pykeen#444)
1.4.0 - 2021-03-04
- Countries (pykeen#314)
- DB100K (pykeen#316)
- MuRE (pykeen#311)
- PairRE (pykeen#309)
- Monotonic affine transformer (pykeen#324)
If you're interested in any of these, please get in touch with us regarding an upcoming publication.
- Dataset Similarity (pykeen#294)
- Dataset Deterioration (pykeen#295)
- Dataset Remix (pykeen#296)
- New-style models (pykeen#260) for direct usage of interaction modules
- Ability to train
pipeline()
using an Interaction module rather than a Model (pykeen#326, pykeen#330).
- Lookup of assets is now mediated by the
class_resolver
package (pykeen#321, pykeen#327) - The
docdata
package is now used to parse structured information out of the model and dataset documentation in order to make a more informative README with links to citations (pykeen#303).
1.3.0 - 2021-02-15
We skipped version 1.2.0 because we made an accidental release before this version was ready. We're only human, and are looking into improving our release workflow to live in CI/CD so something like this doesn't happen again. However, as an end user, this won't have an effect on you.
- CSKG (pykeen#249)
- DBpedia50 (pykeen#278)
- General file-based Tracker (pykeen#254)
- CSV Tracker (pykeen#254)
- JSON Tracker (pykeen#254)
- Fixed ComplEx's implementation (pykeen#313)
- Fixed OGB's reuse entity identifiers (pykeen#318, thanks @tgebhart)
pykeen version
command for more easily reporting your environment in issues (pykeen#251)- Functional forms of all interaction models (e.g., TransE, RotatE) (pykeen#238, pykeen.nn.functional documentation). These can be generally reused, even outside of the typical PyKEEN workflows.
- Modular forms of all interaction models (pykeen#242,
pykeen.nn.modules documentation). These wrap
the functional forms of interaction models and store hyper-parameters such as the
p
value for the L_p norm in TransE. - The initializer, normalizer, and constrainer for the entity and relation embeddings are now exposed through the
__init__()
function of each KGEM class and can be configured. A future update will enable HPO on these as well (pykeen#282).
This release contains a few big refactors. Most won't affect end-users, but if you're writing your own PyKEEN models, these are important. Many of them are motivated to make it possible to introduce a new interface that makes it much easier for researchers (who shouldn't have to understand the inner workings of PyKEEN) to make new models.
- The regularizer has been refactored (pykeen#266,
pykeen#274). It no longer accepts a
torch.device
when instantiated. - The
pykeen.nn.Embedding
class has been improved in several ways: - Embedding Specification class makes it easier to write new classes (pykeen#277) - Refactor to make shape of embedding explicit (pykeen#287) - Specification of complex datatype (pykeen#292) - Refactoring of the loss model class to provide a meaningful class hierarchy (pykeen#256, pykeen#262)
- Refactoring of the base model class to provide a consistent interface (pykeen#246, pykeen#248, pykeen#253, pykeen#257). This allowed for simplification of the loss computation based on the new hierarchy and also new implementation of regularizer class.
- More automated testing of typing with MyPy (pykeen#255) and automated checking
of documentation with
doctests
(pykeen#291)
We've made some improvements to the pykeen.triples.TriplesFactory
to facilitate loading even larger datasets
(pykeen#216). However, this required an interface change. This will affect any
code that loads custom triples. If you're loading triples from a path, you should now use:
path = ...
# Old (doesn't work anymore)
tf = TriplesFactory(path=path)
# New
tf = TriplesFactory.from_path(path)
While refactoring the base model class, we excised the prediction functionality to a new module
pykeen.models.predict
(docs: https://pykeen.readthedocs.io/en/latest/reference/predict.html#functions).
We also renamed some of the prediction functions inside the base model to make them more consistent, but we now
recommend you use the functions from pykeen.models.predict
instead.
Model.predict_heads()
->Model.get_head_prediction_df()
Model.predict_relations()
->Model.get_head_prediction_df()
Model.predict_tails()
->Model.get_head_prediction_df()
Model.score_all_triples()
->Model.get_all_prediction_df()
- Do not create inverse triples for validation and testing factory (pykeen#270)
- Treat nonzero applied to large tensor error as OOM for batch size search (pykeen#279)
- Fix bug in loading ConceptNet (pykeen#290). If your experiments relied on this dataset, you should rerun them.
1.1.0 - 2021-01-20
- CoDEx (pykeen#154)
- DRKG (pykeen#156)
- OGB (pykeen#159)
- ConceptNet (pykeen#160)
- Clinical Knowledge Graph (pykeen#209)
- Neptune.ai (pykeen#183)
- Add MLFlow set tags function (pykeen#139; thanks @sunny1401)
- Add score_t/h function for ComplEx (pykeen#150)
- Add proper testing for literal datasets and literal models (pykeen#199)
- Checkpoint functionality (pykeen#123)
- Random triple generation (pykeen#201)
- Make negative sampler corruption scheme configurable (pykeen#209)
- Add predict with inverse tripels pipeline (pykeen#208)
- Add generalize p-norm to regularizer (pykeen#225)
- New harness for resetting parameters (pykeen#131)
- Modularize embeddings (pykeen#132)
- Update first steps documentation (pykeen#152; thanks @TobiasUhmann )
- Switched testing to GitHub Actions (pykeen#165 and pykeen#194)
- No longer support Python 3.6
- Move automatic memory optimization (AMO) option out of model and into training loop (pykeen#176)
- Improve hyper-parameter defaults and HPO defaults (pykeen#181 and pykeen#179)
- Switch internal usage to ID-based triples (pykeen#193 and pykeen#220)
- Optimize triples splitting algorithm (pykeen#187)
- Generalize metadata storage in triples factory (pykeen#211)
- Add drop_last option to data loader in training loop (pykeen#217)
- Whitelist support in HPO pipeline (pykeen#124)
- Improve evaluator instantiation (pykeen#125; thanks @kantholtz)
- CPU fallback on AMO (pykeen#232)
- Fix HPO save issues (pykeen#235)
- Fix GPU issue in plotting (pykeen#207)
1.0.5 - 2020-10-21
- Added testing on Windows with AppVeyor and documentation for installation on Windows (pykeen#95)
- Add ability to specify custom datasets in HPO and ablation studies (pykeen#54)
- Add functions for plotting entities and relations (as well as an accompanying tutorial) (pykeen#99)
- Replaced BCE loss with BCEWithLogits loss (pykeen#109)
- Store default HPO ranges in loss classes (pykeen#111)
- Use entrypoints for datasets (pykeen#115) to allow registering of custom datasets
- Improved WANDB results tracker (pykeen#117, thanks @kantholtz)
- Reorganized ablation study generation and execution (pykeen#54)
- Fixed bug in the initialization of ConvE (pykeen#100)
- Fixed cross-platform issue with random integer generation (pykeen#98)
- Fixed documentation build on ReadTheDocs (pykeen#104)
1.0.4 - 2020-08-25
- Use number of epochs as step instead of number of checks (pykeen#72)
- Fix bug in early stopping (pykeen#77)
1.0.3 - 2020-08-13
- Side-specific evaluation (pykeen#44)
- Grid Sampler (pykeen#52)
- Weights & Biases Tracker (pykeen#68), thanks @migalkin!
1.0.2 - 2020-07-10
- Add default values for margin and adversarial temperature in NSSA loss (pykeen#29)
- Added FTP uploader (pykeen#35)
- Add AWS S3 uploader (pykeen#39)
- Improved MLflow support (pykeen#40)
- Lots of improvements to documentation!
- Fix triples factory splitting bug (pykeen#21)
- Fix problem with tensors' device during prediction (pykeen#41)
- Fix RotatE relation embeddings re-initialization (pykeen#26)
1.0.1 - 2020-07-02
- Update documentation (pykeen#10)