Replies: 4 comments 3 replies
-
Thank you! One comment so far: I would probably get rid of float types - in my mind this would be handled by a library. |
Beta Was this translation helpful? Give feedback.
-
I had thought we were going to parse the same format we emit for display, since that would allow for the full generality of the IR to be expressed by someone emitting the textual form, this form has no notion of blocks or the control flow instructions the IR supports, and so if someone wants to compile to Miden IR, they will almost certainly not want to emit the textual form. Most compiler backends are going to lower to an SSA form, which is what our IR provides, but this textual form doesn't seem to represent that fully (or enforce the SSA property). This is going to be a particular issue with converging control flow paths - how do you represent a variable that takes on two different values depending on a condition? In the IR, you represent this using phi nodes in the form of block arguments, but there is no equivalent in this textual representation. As a result, we'd have to permit the syntax to be in non-SSA form, and then we're basically in a situation where we're implementing a full compiler frontend for a very low-level language. If we're going to go that route, we may as well make the language higher level and treat it as a separate thing. Setting those concerns aside for a moment, the syntax looks good to me, just a couple notes:
We should add the ability to add
That will be sufficient to support anything we care to do with structs. We can add allocation intrinsics once I add support for globals to the IR, since we'll have to manage memory the same way one does in Wasm (with a global representing the current heap pointer), but in the meantime just having the syntax for describing them is enough IMO.
As I mentioned at the top, if we're assuming the text must be in SSA form, then it isn't possible to describe very simple things like convergent control flow, which isn't unstructured, but can't be described with just
We have support for this in the IR, but yeah, not a priority at all. If we can reuse the MASM parser to parse inline-assembly blocks, then representing them the same way is done in Rust/LLVM IR would be best (i.e. the block is parsed as a string containing the assembly language, and isn't processed as part of the host language grammar).
Yeah we don't need to do a type check phase, but we can do really basic type checking when verifying the IR, since everything will have a type associated with it, all we have to do is make sure that operands have the correct types based on the semantics of a given instruction (e.g. we can verify that
The IR requires SSA form, and we can't assume that someone is emitting the textual IR in SSA form correctly, so we have to verify this when parsing. This is simple to verify though, simply keep track of variables which have been defined as they are encountered in the IR I think we should stick to a textual representation that matches the IR, rather than trying to lift it to a higher level. It will only complicate things, and if we want a higher level language of our own design, then we should do that separately, rather than thinking of it as representing Miden IR directly. That means removing the high-level control flow syntax, introducing blocks and block arguments, and using |
Beta Was this translation helpful? Give feedback.
-
I have only just now discovered that we have a display format for the IR in ir/src/hir/write.rs. A quick look at that file suggests a number of changes I would like to make to the output, but I would need to spend a bit more time studying it.
The problem with a concrete syntax based on basic blocks (which I believe is what you are hinting at) is that it will be harder to parse the control structure. The basic blocks can come in any order, and the logic required to construct the CFG is more complex than what can reasonably be described as parsing, and as I mentioned in the beginning the goal of this syntax is to create something that is easy to parse. I'm ok with a more complex parser/"frontend", but that needs to be a conscious decision. (Side note: If we keep the high-level constructs, then we should add
SSA is not a syntactic property, so there is no way for a syntax to enforce SSA. Hence my comment that SSA should be enforced by earlier compiler phases, and can therefore be assumed by the parser.
Except in the case of converging control flow, though. It gets even more difficult if we change the syntax to use jumps and labels, because the control flow will be impossible for the parser to determine.
There is an implicit assumption (which I agree should be made explicit) that variable names are globally unique and implicitly declared at the top of a function. Then this problem goes away:
This seems like a reasonable assumption to place on a compiler frontend that generates Miden IR.
I'm fine with these changes.
This would be part of adding library imports to the syntax, which as I said I haven't considered yet. I didn't put a huge amount of effort into defining which strings are allowed as names, but I consider that a detail that can be worked out as we go along. This is a first draft after all.
Well, I suppose. We would need to keep track of the inferred type of each variable, and that the inferred type is the same in each branch that assigns to that variable, but I guess that's doable.
The problem of high-level vs. low-level control flow won't go away, though. We had the discussion when we were looking at Sway and Move, and when we changed focus to WASM. We are now having the same discussion again for the concrete syntax for Miden IR, and if/when we start working on our own high-level language we will have it again because it will seem futile to lower the code to Miden IR only to raise it back to MASM (in the same way that it already seems for the high-level languages we have considered so far). Did we not at some point have a plan to have two IRs, one for low-level one for high-level control flow? Did we end up scrapping that idea? |
Beta Was this translation helpful? Give feedback.
-
The order of the basic blocks doesn't matter though, we have a "sea of nodes" style IR, the CFG is implicit in the edges between blocks (given by the control flow instructions terminating those blocks). When parsing, if a reference to a block that hasn't been parsed yet is seen, it can be stubbed out until the actual block definition is reached (at which point things like block arguments can be amended, and the content of the block fleshed out). If a reference to a value whose definition hasn't been parsed yet is encountered, the value definition can be stubbed out as well, and then amended when the actual definition is reached. The only auxilary data structure you would need is a mapping between the parsed block/value IDs and the ones generated by the
It absolutely is a syntactic property of the IR, "static single assignment", i.e. values are immutable (static) and only have a single definition (single assignment).
This is of course the problem with having a textual IR that is at a different level of abstraction than the actual IR. If the textual format is at the same level of abstraction, then we have phi nodes (block arguments) to handle this while preserving the SSA property.
Unfortunately it doesn't, because it wouldn't be possible to ensure during parsing that those variables have definitions that dominate all uses, nor that there is only a single definition along any given control flow path. These are properties easily checked at the level of abstraction our IR is at, which is why having the textual representation match that is so beneficial. We can trivially validate the SSA property is preserved, as a value can only ever have a single definition in the entire function, either as a block argument, or an instruction result - if you ever observe a second definition while parsing, the IR is invalid. The validation pass I mentioned earlier verifies the dominance property. As soon as you kick SSA to the curb in the syntax for the IR, you have to perform the same kind of analysis and transformation as you would from a higher-level language to get it back, which defeats the point.
Sure, makes sense!
Since the IR already requires function and block signatures to be typed, and all instructions produce results which are typed as a function of their inputs, all we actually need to do is make a pass through to verify that nothing disagrees (e.g. that a value passed as an argument to a block matches the type signature of the block). This could be done in the same validation pass that verifies that value definitions dominate uses. We'll want to do that anyway when working directly with the in-memory representation in order to catch errors made during lowering or optimizations/transformations.
It's not a problem, not really. The representation of "higher-level" control flow instructions won't be part of the current IR anyway, it will be represented in the stack IR used for codegen (it's possible we can translate direct to MASM, but my current experiments are based on a separate IR for convenience, I'll revisit it when implementation is done). That IR and the associated compiler pass are what I've been working on. At the point where translation into the stack IR is performed, the SSA IR has been transformed into a tree, rather than a DAG, with loops handled specially during translation. I'll omit the details for brevity here, but there is a straightforward set of rules for converting arbitrarily complex CFGs into an optimal sequence of MASM instructions from this representation (along with a few other properties provided by earlier passes). It remains to be seen if there are any edge cases that I've missed, but I've run a number of interesting CFGs through the algorithm (by hand), with success. In the SSA IR, the low-level CFG representation is desirable, as there are a bunch of graph-theoretic foundations on which various well-known analyses and optimizations are based that rely on it. It's well worth a bit of extra work on the backend to keep that as the representation where we do all of our heavy lifting. As an aside, since you mentioned it - the choice to go for Wasm rather than Move/Sway as a frontend language/IR wasn't based on any issues of control flow, but rather that both Move and Sway have a baked-in impedance mismatch with the Miden rollup. Trying to represent the rollup in either of those languages in a clean way was proving to be a real challenge. Doing so in Rust, and having |
Beta Was this translation helpful? Give feedback.
-
Below is an initial proposal for a concrete syntax for Miden IR. There are a couple of issues we need to resolve before we can finalize the proposal, but I think the basics are there.
I have tried to create a syntax that is easy to parse, because I don't want to spend a huge amount of time writing and maintaining a parser.
There are a handful of keywords (
if
,match
, etc.) that are pretty standard. For everything remotely non-standard I have used a naming convention where operations all start with an_
, and disallowed user-defined variable names that start with_
. This will make it easier to maintain backward compatibility.There are a couple of outstanding issues that I haven't thought into this proposal:
if-then-else
,while
andmatch
statements to manage control flow, because that makes it easier to parse. However, the abstract syntax assumes that we have a CFG, which means that we will parse structed control flow into unstructured control flow, and then restructure it. We can add labels and jumps to the syntax, but I think we should keep the structured constructs as well.assembly
as a keyword for future use.while
loops. It shouldn't be too hard to specify, but it needs to be made clear and exact.Beta Was this translation helpful? Give feedback.
All reactions