Script is available publically and can be imported from - https://s3.amazonaws.com/aws-bigdata-blog/artifacts/airflow.livy.emr/airflow.yaml
This requires access to an Amazon EC2 key pair in the AWS Region you’re launching your CloudFormation stack. Please make sure to create a key-pair in the AWS Region first. Follow : create-your-key-pair
Steps to import:
- Go to AWS Console -> Search for CloudFormation Service and open it.
- Click on create stack -> Select Template is Ready
- In the Amazon S3 URL paste the URL mentioned above.
- This will load a template from
airflow.yaml
- Click Next -> Specify DBPassword and KeyName(the already existing key-pair) and S3BucketName (bucket should not be exisiting, it will automatically create a new bucket).
- Click Next -> Next to run the stack.
After the stack run is successfully completed, got to EC2 and you will see a new instance launched. Connect to instance using ssh connection. You can use putty or can connect using command line using ssh.
Connect using ssh from command line
chmod 400 airflow_key_pair.pem
ssh -i "airflow_key_pair.pem" ec2-user@ec2-your-public-ip.your-region.compute.amazonaws.com
After you are logged in: Run Command:
# sudo as the root user
sudo su
export AIRFLOW_HOME=~/airflow
# Navigate to the airflow directory which was created by the cloudformation template – Look at the user-data section.
cd ~/airflow
source ~/.bash_profile
# initialise the SqlLite database,
# below command will pick changes from airflow.cfg
airflow initdb
Open two new terminals. One to start the web server (you can set the port as well) and the other for a scheduler
# Run the webserver on the custom port you can specify
# MAKE SURE THIS PORT IS SPECIFIED IN YOUR SECURITY GROUP FOR INBOUND TRAFFIC. READ BELOW ARTICLE FOR MORE DETAILS.
airflow webserver --port=<your port number>
# RUN THE SCHEDULER
airflow scheduler
Authorizing Access To An Instance
To see the Airflow webserver, open any browser and type in the
<EC2-public-dns-name>:<your-port-number>
REFERENCES:
Build-a-concurrent-data-orchestration-pipeline-using-amazon-emr-and-apache-livy