-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache invalidations can be missed on rollout of schema changes #535
Comments
@dylanahsmith - Hi! there was a request for a review of this issue First, can I echo back my understanding for confirmation here?
Is that roughly correct? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The problem is mentioned in the README caveats:
This is particularly a problem for slower rollouts of code, such with canary deploys, which leave a larger window of time for database updates to happen with only one of the cache keys being invalidated.
It is also hard to argue that developers should expect this source of lost cache invalidations, since the developer's change may just be to the models ignored_columns attribute for a column that isn't otherwise being used. As in, this is action at a distance, which can cause confusion for developers deploying the change, as well as for users that see the stale data.
As such, I think this is a problem worth fixing. Although, I have had a hard time thinking of a way to do this, since we don't want to compromise performance significantly for this edge case and we don't want to expect developers making the change to be aware of the issue and make other changes to enable something like double cache invalidations (for both keys with the old and new
denormalized_schema_hash
in their cache key prefix).Proposed Solution
I think the solution can be focused on the primary cache index, since that is what is affected by column changes. In the past, we have often bumped
IdentityCache::CACHE_VERSION
when changes have been made to the format of just the primary cache index, but I think we could instead introduce a cache version prefix into thedenormalized_schema_hash
so that we can focus on changes to that schema hash.Given that we desire a solution that doesn't involve manual work by the developer when changing the schema and wouldn't know the schema hash for different versions of the application, we could solve that problem by moving the schema hash out of the key and into the value. The primary cache index already encodes records as a Hash for serialization, so can additional keys for versioning. The schema hash can then be checked after loading the value in order to avoid the value on a mismatch.
When we detect a schema version mismatch when loading from this new key that doesn't include the schema hash, then we don't want to just replace it with one with the correct schema version, since that could lead to contention between two versions of the application during a deploy that keep replacing each others cache values. However, this can be solved by falling back to a schema hash versioned key (e.g. we could use the same keys that we currently use), so the value can be stored for both schema versions. I'll refer to these keys as the main key (without a schema hash in the key) and the fallback key (that includes the schema hash in the key).
We won't be replacing the fallback key on cache invalidation, in fact we won't even be able to build that key for another version of the app. Instead, we can include a new
SecureRandom.uuid
data_version
in the cache invalidation value, which will then also be copied into the corresponding value when doing a cache fill of either the main key or fallback key. That way, the cache invalidation can be detected, even after a cache invalidation and cache fill of the main key with a different schema cache version, by checking thedata_version
of the main key against the fallback key and treating a mismatch as if the fallback key value has been invalidated.The performance of the proposed solution during normal operation will be comparable to the current performance, since it would just be using the main key when there aren't any schema hash mismatches. A deploy of a schema change would lead to extra latency to load the fallback key after a schema hash mismatch, but it would only be when loading caches of records for the model that had its schema changed and wouldn't introduce additional database load. After the deploy completes, it will start to convert back to just using the main key as the cache entries for the old schema get invalidated or expire.
An additional optimization can help with improving performance during schema mismatches and reduce time to convergence: after filling the fallback key as a result of a schema mismatch, copy the cache value in the main key into its corresponding fallback key, then replace the main key with multi-schema value marker with the associated
data_version
and a timestamp for the current time. The marker value would be much smaller, so would reduce unnecessary data transfer to load the value from the cache. A configurable duration can be used to determine when the multi-schema value marker should be replaced with the fallback key for the currently running code, so it can converge faster than the general cache value expiration duration.The text was updated successfully, but these errors were encountered: