In this document, we will show you the steps to submit a simple FeatHub job to a standalone Flink cluster in session mode. The FeatHub job simply consumes the data from the Flink datagen connector, computes some feature, and prints out the result.
- Unix-like operating system (e.g. Linux, Mac OS X)
- Python 3.7/3.8/3.9
- Java 8
Download a stable release of Flink 1.16.1, then extract the archive:
$ curl -LO https://archive.apache.org/dist/flink/flink-1.16.1/flink-1.16.1-bin-scala_2.12.tgz
$ tar -xzf flink-1.16.1-bin-scala_2.12.tgz
You can refer to the local installation instruction for more detailed step.
# If you are using Flink processor, run the following command
$ python -m pip install --upgrade "feathub-nightly[flink]"
You can deploy a standalone Flink cluster in your local environment with the following command.
$ ./flink-1.16.1/bin/start-cluster.sh
You should be able to navigate to the web UI at localhost:8081 to view the Flink dashboard and see that the cluster is up and running.
Execute the following command to run the nyc_taxi_flink_session.py demo.
$ python python/feathub/examples/nyc_taxi_flink_session.py