Big Data : Introduction to Spark

Requirements

- python
- PySpark 
- Docker

Using Spark on your machine :

Assuming that you have installed spark on your machine, you can run it by simply executing in a terminal:

$ python3 ./navigation.py

Using Spark installed with Docker :

{absolute_path_to_folder} should be replaced by the actual path to the directory where your Python scripts are stored.

When you launch this command, logs are written into the terminal. The last displayed line is an url that you should copy into a web browser.
This url allows you to connect to a Jupyter Notebook that gives access to Spark.

docker run -v {absolute_path_to_folder}:/home/jovyan/work -it \
       --rm -p 8888:8888 -p 4040:4040 jupyter/pyspark-notebook

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
README.md		README.md
lab_spark.pdf		lab_spark.pdf
navigation.ipynb		navigation.ipynb
navigation.py		navigation.py
navigation_answers.pdf		navigation_answers.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data : Introduction to Spark

Requirements

Using Spark on your machine :

Using Spark installed with Docker :

About

Releases

Packages

Languages

chouaibMo/big-data-pySpark

Folders and files

Latest commit

History

Repository files navigation

Big Data : Introduction to Spark

Requirements

Using Spark on your machine :

Using Spark installed with Docker :

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages