A python datapipeline project which builds on pydantic. Right now i got an overview and i will now double down on the benifits of this approach. For example we can parameterize Data Tests with pydantic objects easily.
I have some new ideas how i want to evolve my datapipelines. We will use the chance to use some modern python. Inspired by other great open source projects, e.g. fastapi.
In the process i will learn a lot. Right now the big picture looks like this:
-
Build Pydantic Data Models
-
Use them for testing your DataPipeline
-
Use Pydantic for the configs
-
Build Datapipeline, i want to try out apache-airflow this time
-
Bigquery we will use as datadump
-
Load Data with minimum changes and mainly just don't damage data
-
Use DBT for further transformations and reports
- Data templates with pydantic!
- pydantic docs
- pandas docs for countributing!