You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment, the Purus compilation pipeline makes use of three different representations of the AST:
The Language.PureScript.AST syntax tree (mainly defined in Language.PureScript.AST.Declarations), which is constructed after parsing source files into a concrete syntax tree. This closely mirrors the structure of written source files (e.g. do-notation isn't yet desugared, typeclass constraints have not been replaced with explicit dictionaries, etc). We don't modify this at all, and the only place where we interact with it is in the machinery that converts it to the CoreFn representation. IMPORTANT: The PureScript type checking & inference machinery operate on this representation, and we begin processing it after typechecking has completed.
The CoreFn AST (mainly defined in Language.PureScript.CoreFn.Expr) is the second representation. Our CoreFn is not vanilla PS CoreFn, because our version contains explicit type annotations. At this stage, do notation expressions are desugared to explicit uses of bind, if-then-else expressions have been converted to case expressions, etc. At the moment, this is the format for serialized Purus modules. IMPORTANT: The renamer operates on CoreFn. Renaming is (afaict) the only significant transformation that the vanilla PureScript compiler performs on CoreFn.
The "Final" IR (mainly defined in Language.PureScript.CoreFn.Convert.IR) is the last representation we use before conversion to PIR. This representation only contains (or should only contain) expressions, declarations, and types that are isomorphic to PIR expressions, and is intended as a "parse-don't-validate" sanity check on the previous compiler steps. Here (but not in any previous step), we use a restricted Type type such that (e.g.) row-types are disallowed. Most notably, this IR contains no mention of records or objects, forcing us to desugar record/object expressions into product expressions. Additionally, this AST makes use of the Bound library for safe and efficient substitutions.
One final note: PS identifiers are (for the most part) just strings (well, `Text). The PIR compiler expects identifiers to have a textual name and a "unique" integer identifier. (See the discussion on slack in the IOG channel where MPJ & Yura explain how that works for more context). At the moment we assume that input to the PIR compiler has well-scoped and unique names, and so we just maintain a dictionary of names w/ a stateful counter. It might be beneficial to explicitly tag names with unique integer identifiers earlier on in the pipeline. (AFAICT these identifiers don't have to be De Bruijn indices, though they could be).
Motivation
There is, at present, too large of a gap between CoreFn and the final representation:
CoreFn contains record types and expressions (object updates/literals/accessors), which may mention type variables that have are of kind Row Type. In order to transform CoreFn into the final IR, we must perform monomorphization (at least on expressions with row-polymorphic types).
Right now, we do monomorphization with CoreFn directly. This is extremely fragile - During monomorphization, we generate many new identifiers and perform a lot of inlining. Because CoreFn does not have a mechanism for safe substitution, it is very likely that there are subtle bugs in the monomorphizer which will lead to variable name clashes. It's virtually impossible to write a comprehensive set of tests that would catch all possible mistakes here.
Due to some of the changes I made (particularly to the guarded case desugarer), the CoreFn AST is too "loose". That is: All guards (everywhere) should have been eliminated. Additionally, all ConstrainedTypes should have been desugared into types that mention explicit dictionaries.
Finally, neither CoreFn nor our final IR have, at the moment, a representation of case expressions which neatly mirrors the representation used in PIR. Briefly:
PureScript: (case (constr 1 x y) (alt x t1) (alt x y t2))
PIR: (case (constr 1 a b) (\x -> t1) (\x y -> t2))
This is important because, at the moment, we do the transformation from PS-style alternatives to PIR-style alternatives during the conversion to PIR. PIR is already a fairly "noisy" and final conversion is already tricky, it becomes nigh-unbearable to try to implement the translation for nontrivial case expressions.
The Ask
We need another intermediate representation. Specifically:
This representation must provide a mechanism for safe and efficient substitution. This could be Bound or something close enough to it, just needs to be safe and efficient.
We must convert to the new representation immediatelyafter renaming. We can assume that the PS renamer is robust and more-bug-free-than-anything-we'd-write-in-a-week, so we have to do it there to avoid the possibility of introducing our own naming errors in some later transformations.
Ideally, the new representation would also represent case alternatives in a PIR-compatible format (I'm not sure what the best way of doing that would be off the top of my head)
The new representation must, like our CoreFn, contain explicit type annotations for all nodes that require them. (See the CoreFn AST in the sean/monomorph-cleanup branch - I'm pretty sure I removed all the superfluous annotations).
It should furthermore make use of Vanilla PS types i.e. SourceType. Many of the PS compiler's utilities for operating on types are very useful during monomorphization and object desugaring, so while we could winnow the types here, we probably shouldn't.
Acceptance Criteria
Satisfies the requirements in the previous section.
The text was updated successfully, but these errors were encountered:
Context
At the moment, the Purus compilation pipeline makes use of three different representations of the AST:
Language.PureScript.AST
syntax tree (mainly defined inLanguage.PureScript.AST.Declarations
), which is constructed after parsing source files into a concrete syntax tree. This closely mirrors the structure of written source files (e.g. do-notation isn't yet desugared, typeclass constraints have not been replaced with explicit dictionaries, etc). We don't modify this at all, and the only place where we interact with it is in the machinery that converts it to the CoreFn representation. IMPORTANT: The PureScript type checking & inference machinery operate on this representation, and we begin processing it after typechecking has completed.CoreFn
AST (mainly defined inLanguage.PureScript.CoreFn.Expr
) is the second representation. Our CoreFn is not vanilla PS CoreFn, because our version contains explicit type annotations. At this stage,do
notation expressions are desugared to explicit uses of bind,if-then-else
expressions have been converted to case expressions, etc. At the moment, this is the format for serialized Purus modules. IMPORTANT: The renamer operates on CoreFn. Renaming is (afaict) the only significant transformation that the vanilla PureScript compiler performs on CoreFn.Language.PureScript.CoreFn.Convert.IR
) is the last representation we use before conversion to PIR. This representation only contains (or should only contain) expressions, declarations, and types that are isomorphic to PIR expressions, and is intended as a "parse-don't-validate" sanity check on the previous compiler steps. Here (but not in any previous step), we use a restrictedType
type such that (e.g.) row-types are disallowed. Most notably, this IR contains no mention of records or objects, forcing us to desugar record/object expressions into product expressions. Additionally, this AST makes use of theBound
library for safe and efficient substitutions.One final note: PS identifiers are (for the most part) just strings (well, `Text). The PIR compiler expects identifiers to have a textual name and a "unique" integer identifier. (See the discussion on slack in the IOG channel where MPJ & Yura explain how that works for more context). At the moment we assume that input to the PIR compiler has well-scoped and unique names, and so we just maintain a dictionary of names w/ a stateful counter. It might be beneficial to explicitly tag names with unique integer identifiers earlier on in the pipeline. (AFAICT these identifiers don't have to be De Bruijn indices, though they could be).
Motivation
There is, at present, too large of a gap between
CoreFn
and the final representation:CoreFn
contains record types and expressions (object updates/literals/accessors), which may mention type variables that have are of kindRow Type
. In order to transform CoreFn into the final IR, we must perform monomorphization (at least on expressions with row-polymorphic types).ConstrainedTypes
should have been desugared into types that mention explicit dictionaries.(case (constr 1 x y) (alt x t1) (alt x y t2))
(case (constr 1 a b) (\x -> t1) (\x y -> t2))
The Ask
We need another intermediate representation. Specifically:
Bound
or something close enough to it, just needs to be safe and efficient.CoreFn
, contain explicit type annotations for all nodes that require them. (See the CoreFn AST in thesean/monomorph-cleanup
branch - I'm pretty sure I removed all the superfluous annotations).SourceType
. Many of the PS compiler's utilities for operating on types are very useful during monomorphization and object desugaring, so while we could winnow the types here, we probably shouldn't.Acceptance Criteria
Satisfies the requirements in the previous section.
The text was updated successfully, but these errors were encountered: