You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By "roundtrip property" I mean for any event if I serialize it with putEvent, getEvent on the resulting ByteString should give me the original event. #41
shows that we currently don't have this property.
Motivation
I'm trying to improve large eventlog handling of ThreadScope [1]. For this I
need some kind of filesystem-backed data structure to be able to serialize
unused events [2] to a file and read the ones that are needed. As shown in #41
we currently can't do this.
Problem
The reason why we currently don't have this property is GHC expects that event
parsers should be able to parse an event using only a prefix of the
serialization of the event, and ghc-events is implemented accordingly. This
means that GHC can add more fields to an event as long as (1) it updates the
event size in the .eventlog file header (2) it puts the new field at the end of
the event. [3] is an example where this happens. Event parsers can then check
size of an event using the .eventlog file header, and skip any space between
the assumed size of the event and what the header says the event's size is [4].
The problem occurs because the serializer (putEvent and friends) do not take
these extra fields into account and serializes events based on their assumed
size by ghc-events [5]. So if we try to parse the serialized event using the
original parser used to read the .eventlog file the parse fails.
Proposed fix
I propose fixing this by adding a new "extras" field to events:
Parsers can then put the skipped data in this field and serializers can
directly serialize that field after serializing the event.
Because this is a breaking change a major version bump will be needed. A major
user is ThreadScope, which already has an upper bound <0.9.0 so I don't think
this will cause major breakage, and I'll also update ThreadScope so that the
next version will be compatible with new ghc-events.
How does that sound? @maoe I think these days you're maintaining the library,
is there anyone else I should ping about this? The implementation should be
simple but I just wanted to get some opinions before investing more into it.
The text was updated successfully, but these errors were encountered:
I implemented this and it works nicely. I can parse a 225M eventlog file and for every event check if I can serialize and deserialize it. I'll submit a PR soon.
By "roundtrip property" I mean for any event if I serialize it with
putEvent
,getEvent
on the resultingByteString
should give me the original event. #41shows that we currently don't have this property.
Motivation
I'm trying to improve large eventlog handling of ThreadScope [1]. For this I
need some kind of filesystem-backed data structure to be able to serialize
unused events [2] to a file and read the ones that are needed. As shown in #41
we currently can't do this.
Problem
The reason why we currently don't have this property is GHC expects that event
parsers should be able to parse an event using only a prefix of the
serialization of the event, and
ghc-events
is implemented accordingly. Thismeans that GHC can add more fields to an event as long as (1) it updates the
event size in the .eventlog file header (2) it puts the new field at the end of
the event. [3] is an example where this happens. Event parsers can then check
size of an event using the .eventlog file header, and skip any space between
the assumed size of the event and what the header says the event's size is [4].
The problem occurs because the serializer (
putEvent
and friends) do not takethese extra fields into account and serializes events based on their assumed
size by
ghc-events
[5]. So if we try to parse the serialized event using theoriginal parser used to read the .eventlog file the parse fails.
Proposed fix
I propose fixing this by adding a new "extras" field to events:
Parsers can then put the skipped data in this field and serializers can
directly serialize that field after serializing the event.
Because this is a breaking change a major version bump will be needed. A major
user is ThreadScope, which already has an upper bound
<0.9.0
so I don't thinkthis will cause major breakage, and I'll also update ThreadScope so that the
next version will be compatible with new ghc-events.
[1]: haskell/ThreadScope#29
[2]: e.g. with the help of an LRU cache
[3]: https://phabricator.haskell.org/D3658
[4]: See
padParser
and its call site for the implementation of this.[5]: In
putEventSpec
How does that sound? @maoe I think these days you're maintaining the library,
is there anyone else I should ping about this? The implementation should be
simple but I just wanted to get some opinions before investing more into it.
The text was updated successfully, but these errors were encountered: