Reproducibility: `dvc` and `elyra`

The good news is the assigment of this time is lighter than the previous one 01-bigquery-notebook.

In the last assignment, you have moved from working on BigQuery UI to notebook, though it might feel uncomfortable at first, the benefits will become clearer over time. Now, you have been able to:

Programmatically parse SQL strings and convert BigQuery common queries into functions for the benefit of reusabibilty
Hybrid: more freedom and variety in tools: SQL, Python, pandas, ducks and others to create a powerful and flexible workbench for your data tasks
Versioning: you can easily track changes of your notebooks (SQL, Python codes, outputs) and share them with other data co-workers or stakeholders

From the last assignment, more or less, you have built a simple yet end-to-end data pipeline (which could be very potential to reuse multiple times or even put into production). For the next several sessions, we could focus on that simple and core pipeline, making it more elegantly organized, more efficient, convenient and sustainable with the introduction of other tools and frameworks.

This assignment will focus on the next component of "Best practices": Reproducibility.

🚩 Assignments

dvc-assignment
elyra-assignment

💡 Discussions

This is sharing. You DO NOT need to prepare anything in advance. Just share the current way you work.

Have you ever been stuck with a hand-over repo from someone else with multiple data files (you have no clue, which one to use), and multiple notebooks (you have no clue, which one to run first)?
Have you ever handed over the same repo as (1) to someone else?
How to improve the data tracking/sharing (that we know which version to use, and timely update the lastest version)?
How to explain others how to run your series of notebooks?

📝 References

dvc - Data and Model Versioning
Elyra - Run generic pipelines in Jupyterlab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Reproducibility: `dvc` and `elyra`

🚩 Assignments

💡 Discussions

📝 References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Reproducibility: dvc and elyra

🚩 Assignments

💡 Discussions

📝 References

Reproducibility: `dvc` and `elyra`