PigLatinScript

Pig Latin Script (Hadoop & MapReduce) Steps to execute the Pig Latin Script in Amazon Web Services (AWS) in the Batch Mode:

Here we have to use three Amazon services for executing the Script in AWS, for Storage (S3), Computing (EC2), Analytics (EMR).

Create a bucket in S3 (Simple Storage Service/ HDFS) (bucketname: "piglatinscript") and create four folders by name Input, Output, Script, Logs (for generating logs for execution history).
Upload the four input files (exercise1.txt, exercise2.csv, airports.csv, carriers.csv) in the Input folder in the S3 bucket.
Upload the Pig Latin Scripts in the scripts folder in the S3 bucket.
Create and configure the instances in the AWS Ec2 (Elastic Compute Cloud) that can be present in the 'Compute' section in the AWS management console under the Services.
Create and configure the clusters in the AWS EMR (Elastic MapReduce) that can be present in the 'Analytics' section.
To submit the jobs to the EMR clusters in the 'batch mode'.
Click on the cluster that select to run the jobs and go to the steps section and click on the Add step button.

a) Select the Pig in the dropdown. b) Provide the path to the Scripts in the S3 bucket. c) Provide the path to the input file in the S3 bucket. d) Provide the path to the output folder in the S3 bucket. e) Leave the arguments box as is. f) Select the action on termination as 'continue' (as is). g) click the button Submit.

Note : To view the outputs for the script, open the script's Output text file in the Notepad++ text editor for better view.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
AirFlight Dataset.txt		AirFlight Dataset.txt
Busiest Routes Output.txt		Busiest Routes Output.txt
Busiest Routes.pig		Busiest Routes.pig
Output Top Carriers.txt		Output Top Carriers.txt
Popular Words Output.txt		Popular Words Output.txt
Popular Words.pig		Popular Words.pig
Portion.pig		Portion.pig
README.md		README.md
Rank Output.txt		Rank Output.txt
Rank.pig		Rank.pig
Top Carriers.pig		Top Carriers.pig
Traffic Output.txt		Traffic Output.txt
Traffic.pig		Traffic.pig
WordCount Output.txt		WordCount Output.txt
WordCount.pig		WordCount.pig
airports.csv		airports.csv
carriers.csv		carriers.csv
exercise1.txt		exercise1.txt

Provide feedback