Data Modeling - Kimball, 3NF, Star Schemas, Activity Schema, Whats Best? #1245
Replies: 5 comments 2 replies
-
I've actually had a good degree of success recently using a few of the concepts from the Activity Schema of Narrator frame - especially as a method of modeling on top of the star schema. I fell into this design pattern when working with a particularly messy and unreliable front-end event tracking system. My stakeholders were interested in seeing usage patterns and the implementation of our FE tracking made it difficult to put trust in. I realized that 80% of the events that the product team wanted to track were actually events that impacted database objects. Changing this configuration, deleting object A, etc etc. So instead of trying to wrangle an engineering team focused on launching new features, I instead took those objects (already represented as dimensions in my star schema) and created unpivoted changelogs with the below code.
I was then able to find the specific field level changes that represented the things that the stakeholders were interested in changing and representing that as an This resulted in a single table that contained all user actions in the same format. It was at this point that dbt-labs released their Have to imagine that only helps in a paradigm where you have defined |
Beta Was this translation helpful? Give feedback.
-
Thanks for getting this discussion going @sisu-callum. Aside from the options listed in the title, it would be good to write down our perspective on things like Unified Star Schema(USS), Data Vault(DV), and One Big Table(OBT). I believe each method has pros and cons and it is possible that in certain contexts one technique would be preferable. For example, in a large enterprise it is possible that DV is preferred in the source / staging area while OBT is preferred in the data consumption layer for end user simplicity, but USS might also be an option. One other angle I have thought about is long term maintenance / support. While one technique might be preferable like DV, the added training / understanding might lead us to a less optimal, but more understandable recommendation. |
Beta Was this translation helpful? Give feedback.
-
Likewise, great to see the conversation deepening @sisu-callum. I'll repost my initial comment from slack here:
|
Beta Was this translation helpful? Give feedback.
-
@dangalavan You may be interested in this discussion given your DEDAG initiative. |
Beta Was this translation helpful? Give feedback.
-
One important thing to keep in mind with modeling styles is that it's not just a technical decision. There are maintenance and scaling costs associated with doing something that is too innovative. Dan McKinley has a great article on "choosing boring technology" where he makes the case that, while companies tend to be optimistic about innovating endlessly, they actually only have a finite number of 'innovation tokens' to spend. Doing something cutting edge with data modeling is actually spending one of those innovation tokens. If it doesn't provide the upside of a significant competitive advantage for your business' core product offering, in my experience, it's not worth one of your team's innovation tokens. I've been on both sides of this. I've built a DW using a more custom/innovative paradigm. I've built multiple using Kimball. At this point, Kimball is more or less a "boring" technology. Lots of shops have used it to great effect. It's not particularly innovative or exciting. It's a little outdated and can be clunky. But it's well documented. And proven. And there's great value in that. If your business evaluates Kimball and decides you have a significant reason to spend an 'innovation token' on something other than Kimball or one of the other well-trodden methodologies, I also think that's perfectly valid. As long as you're viewing it as a cost and a risk with upside, because that's truly what it is. |
Beta Was this translation helpful? Give feedback.
-
What is the best modeling style in the era of dbt and modern cloud data warehouses? The tradeoffs frequently mentioned in the data modeling texts of the 90s on't necessarily apply anymore. For example, Aaron Ormiston explained this well by saying:
So what is the best modeling style? Obviously there are many different implementations and different styles can fit for different use cases but an opinionated first start might be a great lead-in to junior people within the community who aren't familiar with the concept of data modeling. Or those seasoned veterans who are coming into the cloud data warehouse space from the on-prem world and are looking for guidance on modeling in a different paradigm!
My example that I like to give is:
Based on this slack thread started by Noel Gomez,
Beta Was this translation helpful? Give feedback.
All reactions