essien1990 / Apache-Spark Public

Notifications You must be signed in to change notification settings
Fork 1
Star 1

Batch Processing using Apache Spark and Python for data exploration

1 star 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
PySpark_data_exploration.ipynb		PySpark_data_exploration.ipynb
README.md		README.md

Repository files navigation

Apache Spark Using Python3 for data analysis

Batch Processing using Apache Spark and Python3 for data exploration
Dataset was downloded from https://www.kaggle.com/
Focusing on Pyspark SQL libraries
- from pyspark.sql.types import BooleanType
- from pyspark.sql.functions import udf
- from pyspark.sql import functions as F
- from pyspark.sql import SparkSession
- from pyspark.sql import Window

About

Batch Processing using Apache Spark and Python for data exploration

apache-spark jupyter-notebook python3 pyspark jupyter-lab pyspark-sql

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook 100.0%