Version 0.1.0 Release Candidate 1
Pre-releaseXanthus 0.1.0
This is the first release of Xanthus, a Neural Recommendation Model package implemented in Python on top of TensorFlow and utilising the high-level Keras API. Xanthus came into existence as an exercise in implementing and replicating the results of a relatively current ML paper and to try out some of the new features of TensorFlow 2.0 (and changes to the Keras API over the last couple of years!).
Release notes
Here's what's in the box:
Models
Three neural recommender models implemented with the Keras Model API:
GeneralizedMatrixFactorization
(GMF) - This model generalizes 'classic' matrix factorization (MF) as a neural model. By using the pointwise negative sampling approach outlined in literature, this model can produce higher performance than some 'classic' MF approaches for some datasets.MultiLayerPerceptron
(MLP) - A model with two input embedding blocks feeding into a 'classic' Multi-Layer Perceptron (MLP) block. As demonstrated in the literature, this architecture benefits from the depth of the model over 'shallower' models such as the GMF model in some cases.NeuralMatrixFactorization
(NMF) - This model combines the GMF and MLP models into a single model. Theoretically with the benefits of both!
Bonus features
- Metadata support - The implementations of the above models (+ supporting utils) in this package are implemented to make it easy to quickly introduce metadata into your recommendation models. This means Xanthus natively supports 'hybrid' recommendations (interaction data + user/item metadata). This is mentioned in He et al's work, but not implemented or assessed. Here's an example if you're interested.
- TensorBoard support - By using the Keras
Model
API, Xanthus natively supports TensorBoard for model training and monitoring -- plus custom callbacks too. Why not Slack yourself after each training epoch? What could possibly go wrong?
Data Utilities
Getting your data encoded neatly and quickly, generating useful training and evaluation datasets and getting that data into a format that can be used by your models can be a fiddly and time consuming process. To alleviate some of these issues and to help you get stuck into tuning your models, Xanthus provides the following utilities:
xanthus.datasets.Dataset
- A utility class for quickly and (relatively) efficiently building recommendation-friendly datasets, with a bunch of bundled utilities for manipulating these datasets too.xanthus.datasets.DatasetEncoder
- Another utility class for encoding and decoding datasets, and to aid in preserving consistency across split datasets (i.e. train/test datasets).xanthus.evaluate.split
- An implementation of the 'Recommender Split' implemented as part of the Azure ML Studio. This gives you the option of sampling hold-out interactions, selecting subsets of interactions, and ensuring consistency between the resulting train and test sets.xanthus.evaluate.create_rankings
- An implementation of the common ranking evaluation protocol used for recommendation models wheren
'positive' items (items a user has interacted with, but weren't present in the test set) are appended tom
'negative' items (items a user hasn't interacted with). The model can then be queried to generate a ranking for these items, with the hope that 'positive' items will appear higher in the query results.
Bonus Features
But wait, there's more! Xanthus implements some common recommendation model metric functions including:
xanthus.evaluate.metrics.ndcg
- An implementation of the Normalized Discounted Cumulative Gain (NDCG) metric. Yes, that is a reference to Wikipedia.xanthus.evaluate.metrics.hit_ratio
- An implementation of the common 'hit ratio' metric used in many recommendation model evaluation activities (see alsoxanthus.evaluate.metrics.precision_at_k
).xanthus.evaluate.metrics.truncated_ndcg
- A special-case NDCG implementation that has some performance optimizations for cases when the target set consists of a single 'positive' item in a set as opposed to the more general case addressed above.
Additionally, to make using these functions easier, you can use:
xanthus.evaluate.metrics.score
- A utility function for executing amap
operation over a set of recommendations, applying a provided metric function, and then returning these scores as a NumPy array. This function provides support for parallel processing too!
Finally, if you're interested in 'coverage' metrics, there's:
xanthus.evaluate.metrics.coverage_at_k
- Coverage metrics can be handy for understanding how diverse your model's recommendations are -- exploring product catalogues is often a major motivation for recommenders in the first place, so 'pure' accuracy and ranking metrics (as above) might not give you the full picture.
Notes
Xanthus has been implemented with the aim of helping new users get a decent neural recommender model working as quickly as possible. From this point of view, it could be a good starting place for folk trying to get started with neural recommendation models.
That said, while neural models sound exciting and might attract attention, you might find that 'classic' recommendation models fit you're use-case better: 'lightweight' matrix factorization approaches are often simpler, faster and easier to use, so you might do well to look at those first. If you're interested, you should check out:
implicit
- Implicit Matrix FactorizationLightFM
- Hybrid Matrix Factorization
Disclaimer
The neural architectures implemented in this package are (currently) based directly upon He et al's work on Neural Collaborative Filtering. This team has their own repository with the code they used in their paper. It's a good paper, I encourage you to check it out!