Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peform table-based translation of runtime event names and tags #44

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

eutro
Copy link
Contributor

@eutro eutro commented Apr 18, 2024

This PR implements translation of runtime event names (and tags) using tables loaded at runtime, in order to allow consumption of runtime events from different versions of OCaml. It also adds a new command olly{,_bare} gen-tables to generate the table file from the caml/runtime_events.h C header file.

Currently, when Olly profiles a program compiled with a newer (or older) version of OCaml than Olly was compiled for, two bugs may occur:

  1. olly trace generates nonsensical names for the slices1
  2. olly gc-stats silently generates garbage output, if any of the runtime events it matches on have changed

This occurs because runtime events are read from a mmapped ring buffer using C with absolutely zero regard for the version of the OCaml runtime it was produced with (and there is no way to check in the first place), and the integers written to the ring buffer are directly interpreted as elements of the enumerations in runtime_events.ml. Since runtime events may have been inserted in arbitrary positions in the enum (e.g. ocaml/ocaml#12923), the integer values written into the ring buffer may differ from those that our version should interpret them as, leading to completely wrong Runtime_events.( lifecycle, runtime_phase, and runtime_counter ) values.

Here is an annotated image of how the issue manifests and is fixed using Olly built on 5.1.0 currently (on the left) and performing name translation (on the right):

Olly Perfetto Mistranslated Names Demo

Olly Perfetto Mistranslated Names Demo

ring_pause (seen on the left) isn't even a runtime_phase event name, but a lifecycle event name! Something has gone terribly wrong...


The implementation is in two parts:

  1. Firstly, I implement a library olly_rte_shim to unify all the different types of runtime events (runtime_phase, runtime_counter, lifecycle, alloc, and custom events) into a single manageable event type.

    • I had made the Event.t type for olly trace to abstract the name/argument extraction of events from the trace format backends
    • This is a generalisation (and replacement) of the Event.t type, which now also maintains the "tag" of the event (rather than just forgetting it for the name) – ­i.e. the kind of runtime_phase, runtime_counter, custom event, etc. it is.
    • This is also useful for using alternate sources of runtime events, for instance I have an implementation that allows saving and replaying the full event trace to a text file here Olly --replay and trace --format=replay eutro/runtime_events_tools#2, which could even be streamed over the network externally to olly
  2. Secondly, I implement table-based translation of event names and tags

    • This includes the new command olly gen-tables which creates a YAML2 or OCaml file, the former which can be read at runtime, and the latter which is linked into Olly to generate the builtin table of events existing in this version
    • It also adds a new --table (no short form [yet?]) option to both olly gc-stats and olly trace to load a source table from a file
    • Event names are translated straightforwardly by using the integer value of the enums (via Obj.magic) to index the tables
    • Event tags are translated slightly less straightforwardly, by computing integer arrays from matching the indices of the names in the source/destination tables, and then converting the integer values (via Obj.magic) to the corresponding event tag enumerations

Footnotes

  1. and could, I believe, possibly crash, under bad circumstances, though I haven't seen it happen

  2. it's just three lines of kind: [event_name,...], (and can only be parsed from that subset), having it be valid YAML means other tools can potentially use it, e.g. for diffing as json

- This allows for reading in and mapping names using name tables

- This also allows for different event sources than the Runtime_events API itself, e.g. for streaming
- Fixes a bug in gc-stats where profiling a program on a newer OCaml version would yield incorrect results, due to `EV_STW_LEADER` and `EV_INTERRUPT_REMOTE` being offset
@eutro eutro marked this pull request as draft April 19, 2024 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant