Hey 👋
- Currently Senior EM at Databricks focusing on data portal, data lineage and other discovery efforts for Unity Catalog
- Apache Airflow PMC member and committer
- Co-creator and Maintainer of Amundsen
- Amundsen: github.com/amundsen-io/amundsen
- 🐦 Twitter: @photoft45
- Slack: @amundsen / Tao Feng
- 👔 LinkedIn: @tao-f-17195814
- Democratize Data Discovery And Data Insight With Databricks Platform @ Data+AI summit 2024
- Discover Data Lakehouse With E2E Lineage @ Data+AI summit NA 2022
- Data Discovery at Databricks with Amundsen @ Data+AI summit NA 2021
- Data discovery Amundsen & Presto @ Presto DB meetup Dec 2020
- Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metadata Platform @ Data+AI summit Europe 2020
- Project Amundsen update @ LFAI Mini Summit and open source summit europe 2020
- Airflow Summit 2020 invited key note (slide)
- Airflow @ Lyft @ SF Big Analytics Meetup April 2019
- Amundsen: A Data Discovery Platform from Lyft @ Data Council SF April 2019
- Disrupting Data Discovery @ Strata SF 2019
- Accelerating discovery on Unity Catalog with a revamped Catalog Explorer @ Databricks Engineering blog 2024
- Creating a bespoke LLM for AI-generated documentation @ Databricks Engineering blog 2023
- Announcing Public Preview of AI Generated Documentation In Databricks Unity Catalog @ Databricks Platform blog 2023
- Announcing General Availability of Data lineage in Unity Catalog @ Databricks Platform blog 2022
- Announcing Public Preview of Data Lineage in Unity Catalog @ Databricks Platform blog 2022
- Announcing the Availability of Data Lineage With Unity Catalog @ Databricks Platform blog 2022
- Amundsen: one year later @ Lyft engineering blog 2020
- Open Sourcing Amundsen: A Data Discovery And Metadata Platform @ Lyft engineering blog 2019
- Securing Apache Airflow UI With DAG Level Access @ Lyft engineering blog 2019
- Running Apache Airflow At Lyft @ Lyft engineering blog 2018
- Common Issue Detection for CPU Profiling @ Linkedin engineering blog 2017
- ODP: An Infrastructure for On-Demand Service Profiling @ Linkedin engineering blog 2017
- Benchmarking Apache Samza: 1.2 million message per sec on a single node @ Linkedin engineering blog 2015
- ODP: An Infrastructure for On-Demand Service Profiling @ IEEE ICPE 2018
- Effective Multi-stream Joining for Enhancing Data Quality in Apache Samza Framework @ IEEE Bigdata Congress 2016
- A Memory Capacity Model for High Performing Data-filtering Applications in Samza Framework @ IEEE Big Data 2015
- Interview with Software Engineering Daily on Data Discovery at Lyft
- Interview with Data Engineering Podcast on Amundsen
- On-demand profiling based on event streaming architecture (granted)
- [pending patent]