Compute Index Workflow of Sentinel Reprocesses Data on Rerun with Extended Time Range Instead of Utilizing Cached Data #196

click2cloud-SanchitG · 2024-08-29T07:32:32Z

In which step did you encounter the bug?

Workflow execution

Are you using a local or a remote (AKS) FarmVibes.AI cluster?

Local cluster

Bug description

Issue: Compute Index Workflow of Sentinel Reprocesses Data on Rerun with Extended Time Range Instead of Utilizing Cached Data

Link to the notebook: Notebook Link

Workflow File spaceeye_index-Sanchit.zip

1. Scenario 1:

Time Range: (datetime(2021, 1, 1), datetime(2021, 3, 3))
Workflow File: spaceeye_index-Sanchit.yaml (attached)
Run Name: SpaceEye and NDVI Timelapse 2021
Duration: 01:11:54
Screenshot: (attached)

2. Scenario 2:

Time Range: (datetime(2021, 1, 1), datetime(2021, 3, 20))
Workflow File: spaceeye_index-Sanchit.yaml (attached)
Run Name: SpaceEye and NDVI Timelapse 2021
Duration: 00:58:59
Screenshot: (attached)

Observation:

When running the Compute Index workflow for the first time over a specific time range, it processes the data and stores it in the cache. However, when the workflow is run a second time with an extended time range, it starts reprocessing all the data from scratch instead of utilizing the previously cached data.

Problem:

The Compute Index workflow does not appear to leverage cached data when reprocessing. Instead of utilizing the cached results from the initial run, it processes all data again, leading to increased runtime. This issue results in inefficient processing, particularly problematic since this workflow is executed weekly and has been implemented on the customer's side.

Steps to reproduce the problem

Steps to Reproduce:

Trigger the SpaceEye and NDVI Timelapse 2021 Workflow with the time_range and wf_dict from Scenario 1.
Note the duration.
Increase the time_range and rerun the workflow with the wf_dict from Scenario 2.
Compare the duration.

Expected Behavior:

The workflow in Scenario 2 should complete in less time, proportional to the additional days added, by utilizing the cached data from the initial run.

Environment:

FarmVibes.AI
Python Version: 3.11
Operating System: Ubuntu (cluster environment)

Questions:

Why does the workflow take nearly the same duration for the extended time range as it did for the initial range, despite having cached data from the first run?
Why is the workflow not utilizing the cached data and processing only the additional days?

Please check this issue as soon as possible, as our customers are expecting a resolution.

Thanks & Regards,

Sanchit

The text was updated successfully, but these errors were encountered:

rafaspadilha · 2024-11-27T14:29:21Z

Hi, @click2cloud-SanchitG.

This is expected for the SpaceEye workflow, because increasing the time range means that the model might have more rasters available to perform the interpolation of cloudy pixels during inference.

However, I would expect to see the download and preprocessing operations to retrieve the cached results for most inputs, leading to a smaller duration for the initial ops in the workflow.

Could you inspect the logs for the workers and orchestrator? We log when op results are retrieved from cache or when there is a cache miss and the op needs to be executed.

click2cloud-SanchitG added the bug Something isn't working label Aug 29, 2024

github-actions bot added local cluster Issues encountered in local cluster workflows Issues encountered when running workflows triage Issues still not triaged by team labels Aug 29, 2024

rafaspadilha removed bug Something isn't working triage Issues still not triaged by team labels Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute Index Workflow of Sentinel Reprocesses Data on Rerun with Extended Time Range Instead of Utilizing Cached Data #196

Compute Index Workflow of Sentinel Reprocesses Data on Rerun with Extended Time Range Instead of Utilizing Cached Data #196

click2cloud-SanchitG commented Aug 29, 2024 •

edited

Loading

rafaspadilha commented Nov 27, 2024

Compute Index Workflow of Sentinel Reprocesses Data on Rerun with Extended Time Range Instead of Utilizing Cached Data #196

Compute Index Workflow of Sentinel Reprocesses Data on Rerun with Extended Time Range Instead of Utilizing Cached Data #196

Comments

click2cloud-SanchitG commented Aug 29, 2024 • edited Loading

In which step did you encounter the bug?

Are you using a local or a remote (AKS) FarmVibes.AI cluster?

Bug description

Issue: Compute Index Workflow of Sentinel Reprocesses Data on Rerun with Extended Time Range Instead of Utilizing Cached Data

1. Scenario 1:

2. Scenario 2:

Observation:

Problem:

Steps to reproduce the problem

Steps to Reproduce:

Expected Behavior:

Environment:

Questions:

Please check this issue as soon as possible, as our customers are expecting a resolution.

Thanks & Regards,

rafaspadilha commented Nov 27, 2024

click2cloud-SanchitG commented Aug 29, 2024 •

edited

Loading