Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul of MaxInFlightLimiter #370

Merged
merged 1 commit into from
Aug 22, 2024
Merged

Overhaul of MaxInFlightLimiter #370

merged 1 commit into from
Aug 22, 2024

Conversation

DrJosh9000
Copy link
Contributor

@DrJosh9000 DrJosh9000 commented Aug 21, 2024

What

Basically rewrite the whole MaxInFlightLimiter.

Relevant to #302 and might fix it

Why

Despite Create taking a context parameter, under the hood, add could block while waiting for completions (a sync.Cond) to wake the goroutine. This is a consequence of sync.Cond itself not being context-aware. At minimum, after detecting context cancellation, completions.Broadcast() would need to be added (somewhere), and a check for context added to the completions.Wait() loop. But...

The mutex locking was probably overly coarse-grained. In add it somewhat makes sense to guard most of the method, since it needs to wrap completions.Wait(). Everywhere else the mutex was really only guarding the map tracking which jobs were in-flight. Reads on the map happened at the start of methods and writes towards the end of functions. But looking at the purpose of these reads and writes, the informer callbacks (OnAdd, OnDelete) exist to make "the model" more closely match reality when out of sync with it (i.e. track jobs that were started by a previous incarnation of the controller), and once add has decided that the job hasn't already been started, then it effectively has ownership of starting that job.

Finally, despite defining mu to be a sync.RWMutex, nothing (not even the sync.Cond or the InFlight method) used RLock/RUnlock, so it could have been a plain sync.Mutex.

So I decided some rethinking of the limiter mechanisms was in order.

How

  • The mutex is now a plain sync.Mutex and guards map accesses in a fine-grained manner, through a compare-and-swap method.
  • The completions sync.Cond is gone, replaced with a tokenBucket buffered channel.
  • Create now blocks on a channel read until a job completion puts a token back in the bucket.
  • By using channel operations, Create can now select between the channel and the context, thus returning early if the context is cancelled.
  • gomock is removed entirely. The mock job handler has been replaced with a fake.

@DrJosh9000 DrJosh9000 force-pushed the limiter-overhaul branch 4 times, most recently from 452300d to 422c363 Compare August 21, 2024 06:23
@DrJosh9000 DrJosh9000 marked this pull request as ready for review August 21, 2024 06:44
@DrJosh9000 DrJosh9000 requested a review from a team August 21, 2024 06:52
@wolfeidau
Copy link
Contributor

@DrJosh9000 would it be better to abstract this a little, rather than interacting with the scheduling channel ect directly in this code?

@DrJosh9000 DrJosh9000 force-pushed the limiter-overhaul branch 2 times, most recently from 4eb15a1 to ad66178 Compare August 21, 2024 07:33
@DrJosh9000
Copy link
Contributor Author

@wolfeidau sounds good. I tweaked the tests a bit and discovered an edge case I had forgotten, that this would have helped with. So we now have tryTakeToken and tryReturnToken.

@DrJosh9000 DrJosh9000 force-pushed the limiter-overhaul branch 3 times, most recently from bcb4676 to a2a42e1 Compare August 22, 2024 02:12
Copy link
Contributor

@wolfeidau wolfeidau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 Looks good

@DrJosh9000 DrJosh9000 merged commit b849cc1 into main Aug 22, 2024
1 check passed
@DrJosh9000 DrJosh9000 deleted the limiter-overhaul branch August 22, 2024 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants