Cannot apply the Prefect deployment boilerplate to a versioned dataset #1427
alexfurnica
started this conversation in
Idea
Replies: 1 comment 9 replies
-
Potentially related to this issue |
Beta Was this translation helpful? Give feedback.
9 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there!
I've been going through tutorials for Kedro, Prefect and Great Expectations to try and make a proof-of-concept of an end-to-end ML pipeline that includes data quality monitoring. At this stage, most of the spaceflights tutorial has worked (except the experiment tracking bug on Windows).
I'm currently attempting to run the pipeline as a Prefect flow using the boilerplate script provided in the Kedro docs. The pipeline registers just fine and it runs successfully the first time. The issue is that it fails during subsequent runs if I don't manually restart the agent. The reason it fails is that the
save
attribute of theVersion
object does not get updated in subsequent runs of the same pipeline so it throws aDataSetError
:The initial run was at 13:17 UTC, but this run was at 13:28 UTC as seen here (+2 hrs due to time difference):
Is this a known issue? Is there something I'm doing wrong? I've tried digging through the code to better understand but it is above my level of experience. Best I could find is that the
tracking.JSONDataSet
is not reinitialized with the correct date, or the cache is never cleared within thesave()
method.Would really appreciate any suggestions!
Beta Was this translation helpful? Give feedback.
All reactions