-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Author dereference implementation guide
- Loading branch information
Showing
16 changed files
with
2,566 additions
and
614 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"label": "Implementation guides", | ||
"position": 4, | ||
"link": { | ||
"type": "generated-index", | ||
"description": "Guides on how to implement ethdebug/format" | ||
} | ||
} |
8 changes: 8 additions & 0 deletions
8
packages/web/docs/implementation-guides/pointers/_category_.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"label": "Dereferencing pointers", | ||
"position": 2, | ||
"link": { | ||
"type": "generated-index", | ||
"description": "Debugger-side reference implementation of ethdebug/format/pointer" | ||
} | ||
} |
20 changes: 20 additions & 0 deletions
20
...web/docs/implementation-guides/pointers/dereference-logic/dereference-logic.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
--- | ||
sidebar_position: 5 | ||
--- | ||
|
||
# The dereference function | ||
|
||
These next few pages cover how the components described thus far are combined | ||
to create the final `dereference(pointer: Pointer)` function. | ||
|
||
- The [Summary](/docs/implementation-guides/pointers/dereference-logic/summary) | ||
page broadly describes the control flow structure behind this function | ||
implementation. | ||
|
||
- [Generating regions on the fly](/docs/implementation-guides/pointers/dereference-logic/generating-regions) | ||
describes the process of recursively processing a pointer and reducing it to | ||
a concrete list of fully-evaluated `Cursor.Region` objects. | ||
|
||
- [Making regions concrete](/docs/implementation-guides/pointers/dereference-logic/making-regions-concrete) | ||
describes the process for converting a single `Pointer.Region` object into | ||
its fully-evaluated `Cursor.Region` equivalent at runtime. |
239 changes: 239 additions & 0 deletions
239
...eb/docs/implementation-guides/pointers/dereference-logic/generating-regions.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,239 @@ | ||
--- | ||
sidebar_position: 2 | ||
--- | ||
|
||
import CodeListing from "@site/src/components/CodeListing"; | ||
|
||
# Generating regions on the fly | ||
|
||
The `dereference()` function internally creates an | ||
[AsyncIterator](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/AsyncIterator) | ||
to produce an asynchronous list of regions. | ||
|
||
The process to produce this list uses a stack of processing requests (which | ||
it calls "memos"), consuming one memo at a time from the top of the stack | ||
and handling it based on what kind of memo it is. | ||
|
||
This is defined by the `generateRegions()` function (defined in conjunction | ||
with `GenerateRegionsOptions`): | ||
|
||
<details> | ||
<summary>`generateRegions()` and `GenerateRegionsOptions`</summary> | ||
|
||
<CodeListing | ||
packageName="@ethdebug/pointers" | ||
sourcePath="src/dereference/generate.ts" | ||
extract={sourceFile => sourceFile.getInterface("GenerateRegionsOptions")} | ||
/> | ||
|
||
<CodeListing | ||
packageName="@ethdebug/pointers" | ||
sourcePath="src/dereference/generate.ts" | ||
extract={sourceFile => sourceFile.getFunction("generateRegions")} | ||
/> | ||
</details> | ||
|
||
Notice how this function initializes two mutable records collections: one for | ||
all the current named regions, and one for all the current variables. As | ||
this function's `while()` loop operates on the stack, memos for saving new | ||
named regions or updating variable values may appear and then get handled | ||
appropriately. | ||
|
||
For reference, see the `memo.ts` module for type definitions for each of the | ||
three types of memo and their corresponding helper constructor functions. | ||
|
||
<details> | ||
<summary>See the `memo.ts` module</summary> | ||
|
||
<CodeListing | ||
packageName="@ethdebug/pointers" | ||
sourcePath="src/dereference/memo.ts" | ||
/> | ||
</details> | ||
|
||
The real bulk of what `generateRegions()` does, however, is hidden inside the | ||
call `yield* processPointer()`. | ||
|
||
## Processing a pointer | ||
|
||
To handle a `DereferencePointer` memo from the stack inside | ||
`generateRegions()`, it defers to the `processPointer()` generator function. | ||
|
||
The signature of this function and associated types are as follows: | ||
|
||
<CodeListing | ||
packageName="@ethdebug/pointers" | ||
sourcePath="src/dereference/process.ts" | ||
extract={ | ||
(sourceFile, project) => { | ||
const definition = sourceFile.getFunction("processPointer"); | ||
const tempSourceFile = project.createSourceFile( | ||
"dereference-process.ts", | ||
"", | ||
{ overwrite: true } | ||
); | ||
|
||
for (const importDeclaration of sourceFile.getImportDeclarations()) { | ||
tempSourceFile.addImportDeclaration(importDeclaration.getStructure()); | ||
} | ||
|
||
const typeAliases = sourceFile.getTypeAliases(); | ||
for (const typeAlias of typeAliases) { | ||
tempSourceFile.addTypeAlias(typeAlias.getStructure()); | ||
} | ||
|
||
const commentText = definition.getLeadingCommentRanges() | ||
.map(range => | ||
sourceFile.getFullText() | ||
.substring(range.getPos(), range.getEnd())) | ||
.join("\n"); | ||
console.debug("commentText %o", commentText); | ||
|
||
const declaration = tempSourceFile.addFunction({ | ||
name: definition.getName(), | ||
parameters: definition.getParameters() | ||
.map((param, index, array) => ({ | ||
name: param.getName(), | ||
type: param.getType().getText(param), | ||
hasQuestionToken: param.hasQuestionToken() || param.hasInitializer(), | ||
leadingTrivia: "\n", | ||
trailingTrivia: index < array.length - 1 ? undefined : "\n" | ||
})), | ||
returnType: "Process", // HACK hardcoded | ||
hasDeclareKeyword: true, | ||
isAsync: true, | ||
leadingTrivia: `${commentText}\n` | ||
}); | ||
|
||
return tempSourceFile.getFunction("processPointer"); | ||
} | ||
} /> | ||
|
||
The `ProcessOptions` interface captures the runtime data at a particular | ||
point in the region generation process: | ||
|
||
<CodeListing | ||
packageName="@ethdebug/pointers" | ||
sourcePath="src/dereference/process.ts" | ||
extract={sourceFile => sourceFile.getInterface("ProcessOptions")} | ||
/> | ||
|
||
The `Process` type alias provides a short type alias for functions like | ||
`processPointer` to use: | ||
|
||
<CodeListing | ||
packageName="@ethdebug/pointers" | ||
sourcePath="src/dereference/process.ts" | ||
extract={sourceFile => sourceFile.getTypeAlias("Process")} | ||
/> | ||
|
||
Effectively, by returning a `Process`, the `processPointer()` has two | ||
different mechanisms of data return: | ||
- By being a JavaScript `AsyncGenerator`, it produces `Cursor.Region` objects | ||
one at a time, emitted as a side effect of execution (via JavaScript `yield`) | ||
- Upon completion of exection, the return value is a list of memos to be | ||
added to the stack. | ||
|
||
**Note** that the expected behavior for this implementation is that the | ||
returned list of memos should be pushed onto the stack in reverse order, | ||
so that earlier memos in the list will be processed before later ones. | ||
|
||
<details> | ||
<summary>See the full definition of `processPointer()`</summary> | ||
|
||
<CodeListing | ||
packageName="@ethdebug/pointers" | ||
sourcePath="src/dereference/process.ts" | ||
extract={sourceFile => sourceFile.getFunction("processPointer")} | ||
/> | ||
</details> | ||
|
||
### Processing a region | ||
|
||
The simplest kind of pointer is just a single region. (Remember that pointers | ||
are either regions or collections of other pointers.) | ||
|
||
There is complexity hidden by function calls here, but nonetheless first | ||
consider the implementation of the `processRegion()` function as a base case: | ||
|
||
<CodeListing | ||
packageName="@ethdebug/pointers" | ||
sourcePath="src/dereference/process.ts" | ||
extract={sourceFile => sourceFile.getFunction("processRegion")} | ||
/> | ||
|
||
Effectively, this function converts a `Pointer.Region` into a | ||
fully-evaluated, concrete `Cursor.Region`, emits this concrete region as the | ||
next `yield`ed value in the asynchronous list of regions, and possibly issues | ||
a request to save this region to process state by its name. | ||
|
||
This pointer evaluation process will be described later. | ||
|
||
### Processing collections | ||
|
||
The recursive cases are fairly straightforward following this architecture. | ||
|
||
|
||
#### Groups | ||
|
||
The simplest collection, a group of other pointers, yields no regions of its | ||
own, but instead pushes each of its child pointers for evaluation later: | ||
|
||
<CodeListing | ||
packageName="@ethdebug/pointers" | ||
sourcePath="src/dereference/process.ts" | ||
extract={sourceFile => sourceFile.getFunction("processGroup")} | ||
/> | ||
|
||
It's essential that each of the child pointers get evaluated in the order | ||
they appear in the list, since later pointers may reference regions named | ||
earlier, etc. | ||
|
||
#### Lists | ||
|
||
List collections are more complex because they dynamically generate a number | ||
of composed pointers based on a runtime count value and introducing a | ||
variable identifier for use inside the dynamic composed pointer evaluation. | ||
|
||
<CodeListing | ||
packageName="@ethdebug/pointers" | ||
sourcePath="src/dereference/process.ts" | ||
extract={sourceFile => sourceFile.getFunction("processList")} | ||
/> | ||
|
||
Note how, because each dynamic child pointer is evaluated based on the | ||
next incremented index value, the memos for updating this variable and | ||
evaluation the child pointer must be interspersed. | ||
|
||
#### Conditionals | ||
|
||
Conditional pointers evaluate to a child pointer given that some runtime | ||
condition evaluates to a nonzero value, optionally evaluating to a different | ||
pointer when that conditional fails. | ||
|
||
Evaluating a conditional thus becomes a simple matter of evaluating the | ||
`"if"` clause and issuing a memo for dereferencing the `"then"` pointer or | ||
the `"else"` pointer if it is specified: | ||
|
||
<CodeListing | ||
packageName="@ethdebug/pointers" | ||
sourcePath="src/dereference/process.ts" | ||
extract={sourceFile => sourceFile.getFunction("processConditional")} | ||
/> | ||
|
||
#### Scopes | ||
|
||
Finally, the last kind of collection defined by this schema is for defining | ||
a scope of variables by identifier by specifying the expression values for | ||
each of those variables. | ||
|
||
Since this schema takes advantage of how JSON objects are ordered lists of | ||
key/value pairs, variables specified later may reference variables specified | ||
earlier. The only trickiness in implementing `processScope` is ensuring that | ||
variable values are available immediately. | ||
|
||
<CodeListing | ||
packageName="@ethdebug/pointers" | ||
sourcePath="src/dereference/process.ts" | ||
extract={sourceFile => sourceFile.getFunction("processScope")} | ||
/> |
74 changes: 74 additions & 0 deletions
74
...cs/implementation-guides/pointers/dereference-logic/making-regions-concrete.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
--- | ||
sidebar_position: 3 | ||
--- | ||
|
||
import CodeListing from "@site/src/components/CodeListing"; | ||
|
||
# Making regions concrete | ||
|
||
There are two main aspects involved when converting from a `Pointer.Region`, | ||
which is full of properties whose values are the dynamic `Pointer.Expression` | ||
objects, into a `Cursor.Region`, whose expression properties have been replaced | ||
with actual bytes `Data`: | ||
|
||
## Fixing stack-located regions' `slot` offset | ||
|
||
Since stack pointers are expected to be declared at one time yet evaluated | ||
later, the relative offset that stack pointers use must be adjusted based on | ||
the initial stack length vs. the current stack length. | ||
|
||
This behavior is encapsulated by the `adjustStackLength` function: | ||
|
||
<CodeListing | ||
packageName="@ethdebug/pointers" | ||
sourcePath="src/dereference/region.ts" | ||
extract={sourceFile => sourceFile.getFunction("adjustStackLength")} | ||
/> | ||
|
||
## Evaluating region property expressions | ||
|
||
The more substantial aspect of making a region concrete, however, is the | ||
process by which this implementation evaluates each of the `Pointer.Region`'s | ||
expression properties and converts them into their `Data` values. | ||
|
||
This process would be very straightforward, except that pointer expressions | ||
may reference the region in which they are specified by use of the special | ||
region identifier `"$this"`. | ||
|
||
Fortunately, the schema does not allow any kind of circular reference, so a | ||
more robust implementation could pre-process a region's properties to detect | ||
cycles and determine the evaluation order for each property based on which | ||
property references which other property. That is, a robust implementation | ||
might take this pointer: | ||
|
||
```json | ||
{ | ||
"location": "memory", | ||
"offset": { | ||
"$sum": [ | ||
0x60, | ||
{ ".length": "$this" } | ||
] | ||
}, | ||
"length": "$wordsize" | ||
} | ||
``` | ||
|
||
... and detect that it must evaluate `length` before evaluating `offset`. | ||
|
||
The **@ethdebug/pointers** reference implementation **does not do any such | ||
smart thing**. Instead, it pushes each of the three possible expression | ||
properties (`"slot"`, `"offset"`, and `"length"`) into a queue, and then | ||
proceeds to evaluate properties from the queue one at a time. | ||
|
||
When evaluating a particular property, if `evaluate()` fails, it adds this | ||
property to the end of the queue to try again later, counting the number of | ||
times this attempt has been made for this property. Because the number of | ||
properties is at most 3, if the number of attempts ever reaches 3, the | ||
implementation can infer that there must be a circular reference. | ||
|
||
<CodeListing | ||
packageName="@ethdebug/pointers" | ||
sourcePath="src/dereference/region.ts" | ||
extract={sourceFile => sourceFile.getFunction("evaluateRegion")} | ||
/> |
Oops, something went wrong.