Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in KFP event processor #414

Open
grahamia opened this issue Dec 13, 2024 · 0 comments
Open

Race condition in KFP event processor #414

grahamia opened this issue Dec 13, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@grahamia
Copy link
Contributor

Describe the bug
Very infrequently it is possible to create to run completion events on the completion of a single pipeline job in kubeflow.

This happens because the argo workflow that is watched by the event processor receives two updates within a short period of time. If this time gap is small enough then the patch to say event processed hasn't been carried out by the first event so both create the run completion event.

To Reproduce
It is only reproducible in a scenario when the webhook call from the event processor is slow enough to make the second call come in. Have managed to reproduce on restarting the manager which seemed to slow the call down enough but only managed once.

Expected behavior
Only one event should be produced on one pipeline run completing

Version and configuration
Master version e5c5150

Solutions
Could patch straight away but then if the webhook call fails would need to rollback the patch so not viable.

Possibly us a mutex lock on a struct with the run id in it, would need to investigate this.

@grahamia grahamia added the bug Something isn't working label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant