Feature Catalog

This example code base can be used as template or inspiration to create your own Feature Catalog. It contains some example code to define your features and expose them to users via a simple API. We assume that the size of your data justifies the use of Spark.

Below a simple example (see also example_usage.ipynb).

all_avatars = spark.read.parquet("data/wow.parquet").select(“avatarId”).distinct()

features = compute_features(
        spark=spark,
        scope=all_avatars,
        feature_groups=[
            Zone(
                features_of_interest=["darkshore_count", "total_count"],
                aggregation_level=“avatarId”
            ),
        ])

Below you find more information about what a Feature Catalog is, why to create it and what the difference is with a Feature Store.

What is a Feature Catalog?

A Feature Catalog is a place where you define and document your features such that they can be easily created via a simple API.

Note that this not (yet) includes the storage of the features in a Feature Store. A Feature Catalog already gives you a lot of benefits without the complexity of a Feature Store or a full Feature Platform. Also see the Architecture section about this difference.

Why create a Feature Catalog?

By creating a Feature Catalog you:

define your features once (single source of truth)
allow re-use of features across teams and projects (better collaboration)

By increasing collaboration you will get the following benefits:

increase in speed of development (easy to re-use code)
increase in reliability / quality of code (more contributors)

Architecture

The full architecture can be found in the docs folder, but here is the main overview:

Including a Feature Store

When including a Feature Store it could look somewhat like this:

How to use

Install package

Install using poetry install.

Create feature group dependency graph

To create a graph of the dependencies between all feature groups you also need to install graphviz: https://graphviz.org/download/

Download example data

curl https://storage.googleapis.com/shareddatasets/wow.parquet -o data/wow.parquet

Run example notebook

After downloading the example data and installing the package you can run the example notebook example_usage.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.vscode		.vscode
data		data
docs		docs
scripts		scripts
src/feature_catalog		src/feature_catalog
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
example_usage.ipynb		example_usage.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Feature Catalog

What is a Feature Catalog?

Why create a Feature Catalog?

Architecture

Including a Feature Store

How to use

Install package

Create feature group dependency graph

Download example data

Run example notebook

About

Releases

Packages

Contributors 3

Languages

License

godatadriven/feature_catalog

Folders and files

Latest commit

History

Repository files navigation

Feature Catalog

What is a Feature Catalog?

Why create a Feature Catalog?

Architecture

Including a Feature Store

How to use

Install package

Create feature group dependency graph

Download example data

Run example notebook

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages