Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reuse concurrency limiter state when reloading host channels #2413

Merged
merged 14 commits into from
Nov 18, 2024

Conversation

bjlaub
Copy link
Contributor

@bjlaub bjlaub commented Nov 8, 2024

Before this PR

The recent changes to implement DNS node discovery came with the drawback that when domain name resolution changes are detected, we reload channels causing all internal channel state to be discarded and recreated anew. This means that concurrency limiter state (i.e. the current limit values encapsulated within a CautiousIncreaseAggressiveDecreaseConcurrencyLimiter instance) for a per-host channel are lost and we need to restart the congestion control algorithm.

This can have some implications in both directions: if the limiter state has reduced the concurrency significantly because the server asked us to slow down, we may end up bombarding the host with more requests than it can handle when a reload happens; if the limiter state has increased concurrency significantly, we will artificially start limiting request concurrency and have to slowly work our way back up to better throughput.

After this PR

This change introduces a ChannelState capable of holding internal state to be retained across channel reloads. In this change, we add a cache mapping TargetUri to a ChannelState instance, where each ChannelState holds an instance of a CautiousIncreaseAggressiveDecreaseConcurrencyLimiter. This allows a ConcurrencyLimitedChannel for a specific TargetUri to be created anew but re-use existing limiter state for that particular host.

Note that this change only affects per-host channels, not per-endpoint channels (which are also concurrency limited, but interact with a QueuedChannel in a much different way). A follow-on PR will address retaining limiter state for per-endpoint channels upon reload.

==COMMIT_MSG==
reuse concurrency limiter state when reloading host channels
==COMMIT_MSG==

Possible downsides?

@changelog-app
Copy link

changelog-app bot commented Nov 8, 2024

Generate changelog in changelog/@unreleased

What do the change types mean?
  • feature: A new feature of the service.
  • improvement: An incremental improvement in the functionality or operation of the service.
  • fix: Remedies the incorrect behaviour of a component of the service in a backwards-compatible way.
  • break: Has the potential to break consumers of this service's API, inclusive of both Palantir services
    and external consumers of the service's API (e.g. customer-written software or integrations).
  • deprecation: Advertises the intention to remove service functionality without any change to the
    operation of the service itself.
  • manualTask: Requires the possibility of manual intervention (running a script, eyeballing configuration,
    performing database surgery, ...) at the time of upgrade for it to succeed.
  • migration: A fully automatic upgrade migration task with no engineer input required.

Note: only one type should be chosen.

How are new versions calculated?
  • ❗The break and manual task changelog types will result in a major release!
  • 🐛 The fix changelog type will result in a minor release in most cases, and a patch release version for patch branches. This behaviour is configurable in autorelease.
  • ✨ All others will result in a minor version release.

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

reuse concurrency limiter state when reloading host channels

Check the box to generate changelog(s)

  • Generate changelog entry

}
}

static final class StateHolder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disclaimer: naming is hard, and I'm not very good at it. Take my comments with a grain of salt!

Thoughts on extracting this to its own file, and renaming to ChannelState? "Holder" makes me think it's using the old initialize-on-demand holder pattern.

If we extract to its own file, the key type could make sense as a static nested class, since the outer class provides context as to what it's a key for e.g. ChannelState.Key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep done

Comment on lines 190 to 193
@Value.Immutable
interface ConcurrencyLimitedChannelState {
CautiousIncreaseAggressiveDecreaseConcurrencyLimiter hostLimiter();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can put the CautiousIncreaseAggressiveDecreaseConcurrencyLimiter directly into the state, since it's the only piece of data we're tracking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 45 to 46
private static final StateHolderKey<ConcurrencyLimitedChannelState> STATE_HOLDER_KEY =
new StateHolderKey<>(ConcurrencyLimitedChannelState.class, ConcurrencyLimitedChannel::createState);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

II think we should include HOST in the name, since this is creating a host-specific limiter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 done

Comment on lines 29 to 35
T cast(final Object value) {
return valueClass.cast(value);
}

Supplier<T> getFactory() {
return factory;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be private

}
}

@SuppressWarnings("DangerousIdentityKey")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could implement equals and hashCode on Key, delegating to super to make this explicit (and avoid the suppression)

Comment on lines 47 to 54
if (state.containsKey(key)) {
return key.cast(state.get(key));
} else {
T value = key.getFactory().get();
Preconditions.checkNotNull(value, "state factory cannot produce a null value");
state.put(key, value);
return value;
}
Copy link
Contributor

@carterkozak carterkozak Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we expect this to be a hot path, so we can use something like:

return key.cast(Preconditions.checkNotNull(
    state.computeIfAbsent(key, keyValue -> keyValue.getFactory().get()),
    "state factory cannot produce a null value"));

}));
LimitedChannel nodeSelectionChannel =
new SupplierChannel(cf.uris().map(new Function<List<TargetUri>, LimitedChannel>() {
private final Map<TargetUri, ChannelState> state = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a ConcurrentHashMap for safety, that way we don't need to think about it

Comment on lines 192 to 195
Set<TargetUri> toRemove = state.keySet().stream()
.filter(uri -> !targetUris.contains(uri))
.collect(Collectors.toSet());
toRemove.forEach(state::remove);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can replace this with state.keySet().retainAll(targetUris);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this is voodoo that i didn't know existed

.filter(uri -> !targetUris.contains(uri))
.collect(Collectors.toSet());
toRemove.forEach(state::remove);
targetUris.forEach(uri -> state.putIfAbsent(uri, new ChannelState()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps computeIfAbsent(uri, _uri -> new ChannelState()) to avoid creating ChannelState instances for keys that are already in the map?

@@ -39,13 +40,20 @@
final class ConcurrencyLimitedChannel implements LimitedChannel {
private static final SafeLogger log = SafeLoggerFactory.get(ConcurrencyLimitedChannel.class);

@VisibleForTesting
static final ChannelState.Key<CautiousIncreaseAggressiveDecreaseConcurrencyLimiter> HOST_SPECIFIC_STATE_KEY =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is visible for the purposes of ConcurrencyLimitedChannelTest#testReuseCachedLimiterState_host, which may not be a particularly useful test. I'm somewhat indifferent, we could delete that test and make this private.

@bjlaub bjlaub changed the title WIP thing to save state when reloading channels reuse concurrency limiter state when reloading host channels Nov 14, 2024
@bjlaub bjlaub marked this pull request as ready for review November 14, 2024 16:43
@bulldozer-bot bulldozer-bot bot merged commit 5fd0855 into develop Nov 18, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants