Skip to content

godatadriven/feature_catalog

Repository files navigation

Feature Catalog

This example code base can be used as template or inspiration to create your own Feature Catalog. It contains some example code to define your features and expose them to users via a simple API. We assume that the size of your data justifies the use of Spark.

Below a simple example (see also example_usage.ipynb).

all_avatars = spark.read.parquet("data/wow.parquet").select(“avatarId”).distinct()

features = compute_features(
        spark=spark,
        scope=all_avatars,
        feature_groups=[
            Zone(
                features_of_interest=["darkshore_count", "total_count"],
                aggregation_level=“avatarId”
            ),
        ])

Below you find more information about what a Feature Catalog is, why to create it and what the difference is with a Feature Store.

What is a Feature Catalog?

A Feature Catalog is a place where you define and document your features such that they can be easily created via a simple API.

Note that this not (yet) includes the storage of the features in a Feature Store. A Feature Catalog already gives you a lot of benefits without the complexity of a Feature Store or a full Feature Platform. Also see the Architecture section about this difference.

Why create a Feature Catalog?

By creating a Feature Catalog you:

  • define your features once (single source of truth)
  • allow re-use of features across teams and projects (better collaboration)

By increasing collaboration you will get the following benefits:

  • increase in speed of development (easy to re-use code)
  • increase in reliability / quality of code (more contributors)

Architecture

The full architecture can be found in the docs folder, but here is the main overview:

Including a Feature Store

When including a Feature Store it could look somewhat like this:

How to use

Install package

Install using poetry install.

Create feature group dependency graph

To create a graph of the dependencies between all feature groups you also need to install graphviz: https://graphviz.org/download/

Download example data

curl https://storage.googleapis.com/shareddatasets/wow.parquet -o data/wow.parquet

Run example notebook

After downloading the example data and installing the package you can run the example notebook example_usage.ipynb.

About

A package to define features and create them via a simple API

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published