- ⚙️ Data Engineer skilled in building robust and scalable ETL pipelines for cloud platforms like AWS and GCP.
- 🔢 Academic background in Mathematics & Data Science with proven abilities in problem-solving and analogous thinking.
- 💼 Brief career in Management & Public Health Consulting where I delivered data-driven strategies across various industries.
- 💻 Passionate about data pipeline development, and containerized workflows.
- Programming Languages: Python, SQL, PySpark, HCL.
- Tools & Technologies: Docker, Airflow, Terraform, AWS (S3, Glue, EC2), GCP (BigQuery, Storage, Cloud Run).
- Behavioral Strengths: Seeing the "bigger picture", abstracting ideas and integrating client expectations into my solution design process.
- Tech Specialties: Writing readable & refactorable ETL functions and configuring Docker images/containers.
-
Scalable and Delegable Pipelines: I believe in designing pipelines that are easy to operate and maintain so that teams can focus on creating new solutions and solving other challenges. This approach contrasts with the common belief that only the author can maintain their code. Good pipelines require little to no author intervention after deployment.
-
Functionality before refinement: I believe in getting things off the ground and starting with a pipeline that works—delivering value quickly—before refining it into a more polished, future-proof version. This saves time and lets me adapt designs based on real-world feedback.
-
Security and Modularity as Cornerstones: Secure and modular designs are fundamental to my work. I focus (too much sometimes) on implementing best practices like secret management, non-hardcoded paths, and modular structures to ensure pipelines are robust, compliant, and easy to maintain.
- Building a weekly ETL pipeline for Near Earth Objects (asteroids) and Coronal Mass Ejections using NASA's API.
- Developing a real-time streaming pipeline using tools like Kafka and Apache Flink to process and analyze weather data.
- Building a pipeline with dynamic workload management capabilities using Airflow sensors and Spark to adjust resource allocation based on data volume.
- Designing a player database for a League of Legends discord group to track in-house scrims, tournaments and player stats.
- Fashion Image ETL Pipeline: A modular pipeline for extracting, transforming, and loading 23+ GB of images and their respective metadata for hypothetical ML use.
If you think I'll be a good addition to your team, feel free to contact me using the links below and let's discuss your next solution!