Build software better, together

prestodb / prestorials

Tutorials and examples of how to deploy Presto and connect it to different data sources

docker aws data tutorial sql mongodb presto example glue walkthrough datalake prestodb presto-connector prestosql lakehouse awsglue

Updated Dec 2, 2024

Undisputed-jay / SpotifyAPI-Data-Engineering-Project

Star

This projects uses ETL (Extract, Transform and Load) pipeline to extract data from Spotify using its API and loads the data to a data source(AWS Athena). The entire pipeline will be built using Amazon Web Services (AWS).

aws sql aws-lambda aws-s3 python3 aws-cloudformation aws-athena awsglue

Updated Jul 8, 2023
Jupyter Notebook

TanishkaMarrott / Real-Time-Streaming-Analytics-with-Kinesis-Flink-and-OpenSearch

Star

This project focuses on real-time data streaming with Kinesis, using Flink for advanced processing and OpenSearch for analytics. This architecture has succinctly handled the complete lifecycle of data from ingestion to actionable insights, making it a comprehensive solution.

opensearch dataengineering cloudcomputing awslambda kinesisdatastreams apacheflink awsglue realtimeanalytics

Updated Aug 4, 2024
Java

Akanksha-tetwar / YouTube-Trending-video-analysis-ETL-using-AWS-Services

Star

In this project I have used the Trending YouTube Video Statistics data from Kaggle to analyze and prepare it for usage.

python aws aws-s3 aws-athena awslambda quicksight aws-glue-crawler awsglue

Updated Nov 7, 2022

olusimeon / reddit-sentimentanalyses-pipeline

Star

This project sets up a real-time data pipeline to fetch data from Reddit, transform it using AWS Glue, and store it in Amazon S3. This involves data streaming, cloud storage, ETL (Extract, Transform, Load) processes, and orchestration using Apache Airflow.

docker aws aws-s3 orchestration apache-airflow etl-pipeline etl-automation awsglue

Updated Sep 18, 2024
Python

catherman / Data-Science-Miscellaneous

Star

AWS S3 & Sentiment Analysis, Basic Plotting with Matplotlib, & Supervised Learning & Machine Learning with Sklearn.

visualization data machine-learning sentiment-analysis athena aws-s3 sklearn supervised-learning matplotlib dataprep correlation-matrix breast-cancer-classification wordcloud-visualization awsglue

Updated Jul 6, 2024
Jupyter Notebook

nazish555 / AWS-Data_Engineering-Spotify_Data

Star

This project showcases a data transformation pipeline utilizing AWS Glue and Amazon Athena to process Spotify data from CSV files. It involves loading, transforming, and storing data in an S3 datawarehouse, enabling seamless querying through Amazon Athena.

aws sql athena s3 etl-pipeline awsglue

Updated Mar 28, 2024
Python

vanibhat02 / Big-Data

Star

Big data and Cloud Deployment

aws big-data athena etl aws-s3 tableau aws-cloudformation awscli sagemaker-deployment iam-authentication awsglue

Updated Jan 15, 2024
Jupyter Notebook

nischaybikramthapa / dbt-athena-tpch

Star

This project demonstrates how you can build downstream data pipeline using dbt in athena

dbt aws-athena tpch dbt-core awsglue dbt-athena

Updated Dec 24, 2022
Python

Mopheshi / DataEngineeringSpecialization

Star

Data Engineering Specialization offered by Joe Reis in partnership with DeepLearning.AI through Coursera...

aws aws-s3 dataengineering awsglue

Updated Sep 29, 2024
Jupyter Notebook

riship1095 / YouTube-ETL

Star

Transformed YouTube’s raw JSON data to parquet & loaded it in an S3 bucket, used Glue Data Catalog for storing metadata & Athena to query the cleaned data. Developed an ETL process using a Lambda job that would be triggered when raw data is loaded into an S3 bucket, processed, and stored for analytical purposes in an S3 bucket.

aws aws-lambda etl aws-s3 data-engineering aws-athena awsglue

Updated Feb 9, 2023
Python

Cuchuflim / ETL-S3-to-Redshift

Star

Incremental Data Load from S3 Bucket to Amazon Redshift Using AWS Glue

aws s3 data-engineering redshift awsglue

Updated Aug 15, 2024
Python

iqrabismii / Big-Data-Projects-

Star

Projects on Big Data Using Pyspark and AWS

ecommerce airflow athena aws-s3 pyspark tableau pyspark-mllib customer-products awsglue

Updated Apr 28, 2023
Jupyter Notebook

VivekaAryan / Reddit-Data-Pipeline

Star

This project offers a robust data pipeline solution designed to efficiently extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. Leveraging a blend of industry-standard tools and services, the pipeline ensures seamless data processing and integration.

aws airflow athena aws-s3 postgresql reddit-api celery redshift-database awsglue

Updated Jun 19, 2024
Jupyter Notebook

wlopezm-unal / Project-airflow-AWSGlue

Star

In this project we can run an ETL in AWS Glue by Orchestrating it with Airflow. This project we create a Docker Compose to raise the services as Airflow, Redis and PostgreSQL. PostgreSQL was use in this project to save metadata get of Airflow

python dockerfile aws airflow s3-bucket pyspark airflow-docker dockercompose awsglue

Updated Sep 26, 2024
Python

najmaelboutaheri / Data-Engineering-Project-Youtube

Star

This project aims to securely manage, process, and analyze structured and semi-structured YouTube data based on video categories and trending metrics. The architecture leverages AWS services to ingest, store, transform, analyze, and visualize data efficiently and at scale.

shell aws aws-lambda python3 awscli awss3 awsglue awsiam

Updated Dec 28, 2024
Python

shaundominic / Kafka-Streaming-Project

Star

Leverages Apache Kafka to facilitate streaming real time data generated by Python to upload data into S3 using s3fs

python aws ec2 s3 apache-kafka awsglue

Updated Dec 14, 2022
Python

pawanyoda / create_glue_table_using_gitlab_cicd

Star

Create Glue table using CI -CD

aws docker-image gitlab-ci awscli awsglue

Updated Oct 10, 2022

parth2050 / aws-data-pipeline

Star

An End-To-End data pipeline integration from Website Source to analytical dashboard in AWS using Python flask, ML models, DynamoDB and other AWS services.

aws python3 ec2-instance datapipeline awslambda cloud-watch aws-quicksight aws-sns-sqs awsglue

Updated Mar 7, 2024
HTML

Harikishan-AI / Harikishan-AI

Star

I am dedicated to delivering innovative solutions that align with business objectives while ensuring optimal performance, reliability, and security. My strong analytical skills, attention to detail, and problem-solving abilities drive me to create effective and efficient solutions.

python machine-learning automation deep-neural-networks aws-lambda aws-s3 logs transformers cnn lstm rnn aws-ec2 powerbi analystics oops-in-python awsglue

Updated Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

awsglue

Here are 23 public repositories matching this topic...

prestodb / prestorials

Undisputed-jay / SpotifyAPI-Data-Engineering-Project

TanishkaMarrott / Real-Time-Streaming-Analytics-with-Kinesis-Flink-and-OpenSearch

Akanksha-tetwar / YouTube-Trending-video-analysis-ETL-using-AWS-Services

olusimeon / reddit-sentimentanalyses-pipeline

catherman / Data-Science-Miscellaneous

nazish555 / AWS-Data_Engineering-Spotify_Data

vanibhat02 / Big-Data

nischaybikramthapa / dbt-athena-tpch

Mopheshi / DataEngineeringSpecialization

riship1095 / YouTube-ETL

Cuchuflim / ETL-S3-to-Redshift

iqrabismii / Big-Data-Projects-

VivekaAryan / Reddit-Data-Pipeline

wlopezm-unal / Project-airflow-AWSGlue

najmaelboutaheri / Data-Engineering-Project-Youtube

shaundominic / Kafka-Streaming-Project

pawanyoda / create_glue_table_using_gitlab_cicd

parth2050 / aws-data-pipeline

Harikishan-AI / Harikishan-AI

Improve this page

Add this topic to your repo