Feature Engineering for Machine Learning with DBT #1329
Replies: 3 comments 4 replies
-
hi, emily! this is a fantastic topic -- the emergence of growing excitement in the past 6 months particularly about using dbt for ML has been really amazing and motivating to see. and thank you for doing such a perfect job of sparking this Discussion and covering all the sections thoroughly - we truly appreciate that! my initial sense is that this is a big and important topic -- so i want to drill into it a bit further so we can get a handhold on where to start digging and crafting some shared knowledge around this together. to do that, i'm curious if you could answer these two (somewhat interconnected) questions:
let me know which of these resonates more with what you want to build! |
Beta Was this translation helpful? Give feedback.
-
Thanks, @gwenwindflower! I'll start drafting the outline. I will target early May for delivery. I want to put some thoughts down and then get feedback from a few community members. |
Beta Was this translation helpful? Give feedback.
-
👋 hey @emekdahl-palmetto ! Hope you're doing well! I figured I'd check back in on the status of the outline that you were working on. I was going through the discussion posts and when I saw this one I got pretty excited about the possibility. As someone that has not yet delved into the world of ML, doing so through the lens of dbt feels like the entry point that many people like myself are looking for. If you'd like any help drafting it up or walking through it, more than happy to pair on it! |
Beta Was this translation helpful? Give feedback.
-
What is the main problem you are solving? What is your solution? This should help form your core thesis.
Feature engineering is a hard problem in ML that impacts model performance. Using dbt to clean data and engineer features speeds up the data-centric approach to iterative modeling.
Why should the reader care about this problem? Why is your solution the right one? This should help form your specific target audience.
The ability to accurately describe and predict the answers to business important questions (and allow stakeholders to do the same) is a clear indicator that the data team is adding value to the organization.
Can you list the steps of your solution for the reader here? This should help you form the overall narrative arc and sketch out an example use case to illustrate it.
Are there any resources that helped inspire or inform your idea? (eg slack discussions, articles, external product docs, etc. -- if so please link)
Data science use cases
Standard examples of feature engineering
Are there other existing solutions that solve the problem, and if so, how is this solution better or different? If so please share any links here.
Beta Was this translation helpful? Give feedback.
All reactions