spark-s3-csv-example

Downloads CSV data, processes them with spark and uploads results to S3

Prerequisites

Python 3.8+
AWS Account for S3 Storage

Setup

Copy your AWS credentials to the default location ~/.aws/credentials Setup parameters in config.yaml. See config-sample.yaml for the required settings.

Install requirements (preferable in a virtual environment):

pip install -r requirements.txt

Running the application

python main.py

Example output

Running with a set of public traffic data, the following config.yaml was used

---
bucket_name: aws-csv-spark-example
url: http://iot.ee.surrey.ac.uk:8080/datasets/traffic/traffic_feb_june/trafficData158324.csv
timestamp_column: TIMESTAMP
count_column: vehicleCount

$ python main.py
Downloading file from http://iot.ee.surrey.ac.uk:8080/datasets/traffic/traffic_feb_june/trafficData158324.csv
Parsing results from /tmp/tmpxmpioeia
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Call S3 component to publish 117 results
Uploaded to https://s3.amazonaws.com/aws-csv-spark-example/result.html
Finished...

After running the S3 container has the following result.html:

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
img		img
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
config-sample.yaml		config-sample.yaml
config.py		config.py
config.yaml		config.yaml
config_test.py		config_test.py
main.py		main.py
main_test.py		main_test.py
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
result_sample.html		result_sample.html
s3.py		s3.py
s3_test.py		s3_test.py
spark.py		spark.py
test.csv		test.csv
test.html		test.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-s3-csv-example

Prerequisites

Setup

Running the application

Example output

About

Releases 1

Packages

Languages

peterjochum/spark-s3-csv-example

Folders and files

Latest commit

History

Repository files navigation

spark-s3-csv-example

Prerequisites

Setup

Running the application

Example output

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages