Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute Index Workflow of Sentinel Reprocesses Data on Rerun with Extended Time Range Instead of Utilizing Cached Data #196

Open
click2cloud-SanchitG opened this issue Aug 29, 2024 · 1 comment
Labels
local cluster Issues encountered in local cluster workflows Issues encountered when running workflows

Comments

@click2cloud-SanchitG
Copy link

click2cloud-SanchitG commented Aug 29, 2024

In which step did you encounter the bug?

Workflow execution

Are you using a local or a remote (AKS) FarmVibes.AI cluster?

Local cluster

Bug description


Issue: Compute Index Workflow of Sentinel Reprocesses Data on Rerun with Extended Time Range Instead of Utilizing Cached Data

Link to the notebook: Notebook Link


Workflow File spaceeye_index-Sanchit.zip

1. Scenario 1:

  • Time Range: (datetime(2021, 1, 1), datetime(2021, 3, 3))
  • Workflow File: spaceeye_index-Sanchit.yaml (attached)
  • Run Name: SpaceEye and NDVI Timelapse 2021
  • Duration: 01:11:54
  • Screenshot: (attached)

Scenario1

2. Scenario 2:

  • Time Range: (datetime(2021, 1, 1), datetime(2021, 3, 20))
  • Workflow File: spaceeye_index-Sanchit.yaml (attached)
  • Run Name: SpaceEye and NDVI Timelapse 2021
  • Duration: 00:58:59
  • Screenshot: (attached)

Scenario2

Observation:

When running the Compute Index workflow for the first time over a specific time range, it processes the data and stores it in the cache. However, when the workflow is run a second time with an extended time range, it starts reprocessing all the data from scratch instead of utilizing the previously cached data.

Problem:

The Compute Index workflow does not appear to leverage cached data when reprocessing. Instead of utilizing the cached results from the initial run, it processes all data again, leading to increased runtime. This issue results in inefficient processing, particularly problematic since this workflow is executed weekly and has been implemented on the customer's side.


Steps to reproduce the problem

Steps to Reproduce:

  1. Trigger the SpaceEye and NDVI Timelapse 2021 Workflow with the time_range and wf_dict from Scenario 1.
  2. Note the duration.
  3. Increase the time_range and rerun the workflow with the wf_dict from Scenario 2.
  4. Compare the duration.

Expected Behavior:

The workflow in Scenario 2 should complete in less time, proportional to the additional days added, by utilizing the cached data from the initial run.

Environment:

  • FarmVibes.AI
  • Python Version: 3.11
  • Operating System: Ubuntu (cluster environment)

Questions:

  1. Why does the workflow take nearly the same duration for the extended time range as it did for the initial range, despite having cached data from the first run?
  2. Why is the workflow not utilizing the cached data and processing only the additional days?

Please check this issue as soon as possible, as our customers are expecting a resolution.

Thanks & Regards,

Sanchit


@click2cloud-SanchitG click2cloud-SanchitG added the bug Something isn't working label Aug 29, 2024
@github-actions github-actions bot added local cluster Issues encountered in local cluster workflows Issues encountered when running workflows triage Issues still not triaged by team labels Aug 29, 2024
@rafaspadilha rafaspadilha removed bug Something isn't working triage Issues still not triaged by team labels Nov 27, 2024
@rafaspadilha
Copy link
Contributor

Hi, @click2cloud-SanchitG.

This is expected for the SpaceEye workflow, because increasing the time range means that the model might have more rasters available to perform the interpolation of cloudy pixels during inference.

However, I would expect to see the download and preprocessing operations to retrieve the cached results for most inputs, leading to a smaller duration for the initial ops in the workflow.

Could you inspect the logs for the workers and orchestrator? We log when op results are retrieved from cache or when there is a cache miss and the op needs to be executed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
local cluster Issues encountered in local cluster workflows Issues encountered when running workflows
Projects
None yet
Development

No branches or pull requests

3 participants
@rafaspadilha @click2cloud-SanchitG and others