This is a starter repo for Apache Flink docker.
git clone https://github.com/GezimSejdiu/flink-starter.git
cd flink-starter
mvn clean package
The flink-starter flow consists of the following steps:
- Setup HDFS
- Setup Flink cluster
- Put input file on HDFS
- Compute aggregations
- Get output from HDFS
To run the flink-starter application as a BDE pipeline, execute the following commands:
git clone https://github.com/GezimSejdiu/flink-starter.git
cd flink-starter
git checkout init-daemon
cd csswrapper/ && make hosts && cd ..
docker network create hadoop
docker-compose up -d
Note:To make it run, you may need to modify your /etc/hosts file. There is a Makefile, which will do it automatically for you (you should clean up your /etc/hosts after demo).
To run the application, execute the following steps:
- Setup a Flink cluster as described on http://github.com/big-data-europe/docker-flink.
- Build the Docker image:
docker build --rm=true -t bde/flink-starter .
- Run the Docker container:
docker run --name flink-starter-app -e ENABLE_INIT_DAEMON=false --link flink-master:flink-master -d bde/flink-starter
Flink/HDFS Workbench Docker Compose file contains HDFS Docker (one namenode and two datanodes), Flink Docker (one master and one worker) and HUE Docker as an HDFS File browser to upload files into HDFS easily. Then, this workbench will play a role as for flink-starter application to perform computations. Let's get started and deploy our pipeline with Docker Compose. Run the pipeline:
docker network create hadoop
docker-compose up -d
First, let’s throw some data into our HDFS now by using Hue FileBrowser runing in our network. To perform these actions navigate to http://your.docker.host:8088/home. Use “hue” username with any password to login into the FileBrowser (“hue” user is set up as a proxy user for HDFS, see hadoop.env for the configuration parameters). Click on “File Browser” in upper right corner of the screen and use GUI to create /user/root/input and /user/root/output folders and upload the data file into /input folder. Go to http://your.docker.host:50070 and check if the file exists under the path ‘/user/root/input/yourfile’.
After we have all the configuration needed for our example, let’s rebuild flink-starter.
docker build --rm=true -t bde/flink-starter .
And then just run this image:
docker run --name flink-starter-app --net hadoop --link flink-master:flink-master \
-e ENABLE_INIT_DAEMON=false \
-e FLINK_MASTER_PORT_6123_TCP_ADDR=flink-master \
-e FLINK_MASTER_PORT_6123_TCP_PORT=6123 \
-d bde/flink-starter