-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Data conversion management #143
Comments
My "gut feeling" vote goes to... |
Is this because there could be a bug in the upcaster that could create invalid events (e.g. an invalid JSON) or a bug in the code that updates the db row that could corrupt the data (e.g. accidentally deletes a property)? Or would there be any other examples that would make a backup mandatory? |
Also given that our data is really valuable - shouldn't we anyway backup our databases? Data corruption during update of an event version is just one of many ways that we could lose the data in a database |
What about separating between in-memory domain Events and persistence concerns? In short,
Reasons:
Having an actual separation between the two layers would make it explicit when you need to simply "upcast/version" the writable data formats into the same in-memory type, vs when you have to actually think about the business logic and just add a new domain event type. In conclusion, at the "persistence" layer, you can actually choose any of the strategies proposed in the thread - or even just leave that as an interface/trait in the EventStore that the user has to implement themselves (granted we can provide a default implementation for the PgStore as we see fit). |
Yes, exactly. And maybe not even for a total bug.
We already do it (daily?). But still, even if it's just a single day, it is a full day lost |
Upcasting seems easier to implement from a library users perspective - they only need to implement From to convert each previous event type into the one used by the application, and be able to hand ESRS a function that consumes a We could write a macro to generate that code for them, something like: let event_deserializer = event_des![serde_json::from_str::<Event>, serde_json::from_str::<EventV2>, serde_json::from_str::<EventV1>] The alternative of asking library users to version their events seems like a lot of extra work (they'd need to write upcasting code anyway, or else have their application be able to handle every version of an event), for not much of a gain (being able to make business decisions based on event version seems like an edge case that is just as well handled by event timestamps). I think version tags have an unfortunate habit of leaking persistence-specific information and complexity into the rest of the codebase. Trying to have a macro that auto-versions Events seems like a lot of accidental complexity - managing the list of previously seen versions, across developer machines and CI runs etc (if you check it in how do you handle merge conflicts?) will be annoying, and non-obvious to users as to how it works. It also runs against problems with backwards compatible and non-backwards-compatible changes - a user might, for example, add an With respect to the proposed For the sake of clarity, the above macro would expand into: |data: &str| -> {
if let Ok(e) = serde_json::from_str::<Event>(data) {
return e;
}
if let Ok(e) = serde_json::from_str::<EvenV2>(data) {
return e.into();
}
if let Ok(e) = serde_json::from_str::<EventV1>(data) {
return e.into();
}
Err(format!("Failed to parse event: {data}"))
} Or something similar |
I totally agree with this, especially when you say:
But for the other perspective this means that you need to have all the events version in your codebase. This means that if you have an event (that should be an enum) with 5 variants everytime you change one of the five variants field you need to replicate the entire event enum. I think that this will be a total mess in the long run. |
I give my 2 cents, even though I did not participate in the internal discussion/study group, so do discard my words if I'm missing some context.
I agree with Angelo and support his points exception made for the last one. If for example at one point in time business decide the field In addition, I vote for |
What kind of default should this be? Some random email? |
Esrs data conversion management
Topic
This is a discussion thread in order to decide which route should this library follow in order to manage data conversion.
State of the art
Actually esrs is implementing the weak schema technique. With this technique the library handles missing attributes or superfluous attributes gracefully, just by manually updating the event shape adding
Optional
fields or removing existing fields from the event. Being that this is an optimistic approach, sometimes retro-compatibility issues while loading older events from the store come out.Techniques
Event versioning/multiple versions
In this technique multiple versions of an event type are supported throughout the application. The event structure is extended with a version number. This version number can be read by all the event listeners, and they have to contain knowledge of the different versions in order to support them. In this technique the event store remains intact as old versions are never transformed. There are no extra write operations needed to convert the store.
Implementation
A macro to put on top of an event that every time that piece of code is compiled optionally insert that new event version in a local schema registry (like a file?) and checks the event retro-compatibility.
The cons with this approach is that is needed to think about a macro attribute to ignore a specific field (for example if the event store has been updated manually) or ask the user to manually fix local schema registry file.
Upcasting
Upcasting centralizes the update knowledge in an upcaster: a component that transforms an event before offering it to the application. Different than in the multiple versions technique is that the event listeners are not aware of the different versions of events. Because the upcaster changes the event the listeners only need to support the last version.
Implementation
Create a new trait
Upcaster
that your event must implement and having a two functions that user must implement, likeupcast
andactual_version
. Inside of theupcast
function the user should manually deserialize a json to get versions and the fields it needs in order to build latest event version?Others
There are two other more exotic techniques to mention:
new store.
The biggest downside for those three techniques is that all those techniques perform, while reading events, updates on database, breaking the "events are immutable" dictatum.
Moreover lazy transformation and in place transformation are not reliable being that changing the event store permanently makes it mandatory to make backups.
And ofc there's the "leave it as it is" option :).
The text was updated successfully, but these errors were encountered: