Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event parsers and serializers should have roundtrip property #42

Open
osa1 opened this issue Aug 29, 2018 · 1 comment
Open

Event parsers and serializers should have roundtrip property #42

osa1 opened this issue Aug 29, 2018 · 1 comment

Comments

@osa1
Copy link

osa1 commented Aug 29, 2018

By "roundtrip property" I mean for any event if I serialize it with putEvent,
getEvent on the resulting ByteString should give me the original event. #41
shows that we currently don't have this property.

Motivation

I'm trying to improve large eventlog handling of ThreadScope [1]. For this I
need some kind of filesystem-backed data structure to be able to serialize
unused events [2] to a file and read the ones that are needed. As shown in #41
we currently can't do this.

Problem

The reason why we currently don't have this property is GHC expects that event
parsers should be able to parse an event using only a prefix of the
serialization of the event, and ghc-events is implemented accordingly. This
means that GHC can add more fields to an event as long as (1) it updates the
event size in the .eventlog file header (2) it puts the new field at the end of
the event. [3] is an example where this happens. Event parsers can then check
size of an event using the .eventlog file header, and skip any space between
the assumed size of the event and what the header says the event's size is [4].

The problem occurs because the serializer (putEvent and friends) do not take
these extra fields into account and serializes events based on their assumed
size by ghc-events [5]. So if we try to parse the serialized event using the
original parser used to read the .eventlog file the parse fails.

Proposed fix

I propose fixing this by adding a new "extras" field to events:

data Event = Event
  { evTime   :: {-# UNPACK #-}!Timestamp
  , evSpec   :: EventInfo
  , evCap    :: Maybe Int
  , evExtras :: ByteString -- NEW FIELD
  } deriving Show

Parsers can then put the skipped data in this field and serializers can
directly serialize that field after serializing the event.

Because this is a breaking change a major version bump will be needed. A major
user is ThreadScope, which already has an upper bound <0.9.0 so I don't think
this will cause major breakage, and I'll also update ThreadScope so that the
next version will be compatible with new ghc-events.


[1]: haskell/ThreadScope#29
[2]: e.g. with the help of an LRU cache
[3]: https://phabricator.haskell.org/D3658
[4]: See padParser and its call site for the implementation of this.
[5]: In putEventSpec


How does that sound? @maoe I think these days you're maintaining the library,
is there anyone else I should ping about this? The implementation should be
simple but I just wanted to get some opinions before investing more into it.

@osa1
Copy link
Author

osa1 commented Aug 30, 2018

I implemented this and it works nicely. I can parse a 225M eventlog file and for every event check if I can serialize and deserialize it. I'll submit a PR soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant