TF Tabular is a project aimed at simplifying the process of handling tabular data in TensorFlow. It provides utilities for building models on top of numeric, categorical, multihot, and sequential data types.
- Create input layers based on lists of columns
- Support custom embeddings: Useful for including external embeddings for example obtained from an LLM.
- Support sequence layers: Useful for time series or when building recommenders on top of user interaction data.
- Support multi-hot categorical columns
- No model building or training: Build whatever you want on top
To get started with TF Tabular, you will need to install it using pip:
pip install tabular-tf
Here is a basic example of how to use TF Tabular:
from tf_tabular.builder import InputBuilder
# Define columns to use and specify additional parameters:
categoricals = ['Pclass', 'Embarked']
numericals = ['Age', 'Fare']
# ....
# Build model:
input_builder = InputBuilder()
input_builder.add_inputs_list(categoricals=categoricals,
numericals=numericals,
normalization_params=norm_params,
vocabs=vocabs,
embedding_dims=embedding_dims)
inputs, output = input_builder.build_input_layers()
output = Dense(1, activation='sigmoid')(output)
model = Model(inputs=inputs, outputs=output)
Which will produce a model like this:
The examples folder includes more complete examples including:
- Titanic: A simple binary classification example using the Titanic dataset.
- MovieLens: A two tower retrieval model using the MovieLens dataset.
- MovieLens Sequential: Another two tower retrieval model build on the MovieLens dataset preprocessed so that the input of the model is the list of movies the user has interacted with.
Contributions to TF Tabular are welcome. Check out the contributing guidelines for more details.
To set up a local development environment, you will need to first clone the repo and then install the required dependencies:
- Install Poetry follow the instructions on the official Poetry website.
- Run
poetry install
- Run
poetry run pre-commit install
to install git pre-commit
This is a list of possible features to be added in the future depending on need and interest expressed by the community.
- Parse dataset to separate numeric vs categoricals, multihots and sequencials
- Implement other types of normalization
- Support computing vocab and normalization params instead of receiving them as parameters
- Improve documentation and provide more usage examples
TF Tabular is licensed under the MIT License. See the LICENSE file for more details.