-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constraint based type inference #219
Constraint based type inference #219
Conversation
TODO: - correctly encode multiple primitive alternatives as constraints - refactor atoms/filters schema-- should all just be Atom<Symbol>?? - get rid of `accept` by refactoring action checker with a global algorithm
I had a discussion with @oflatt about the differences between the proposed conjunctive query-based IR and the existing It is declarative and unordered. The existing
The query corresponds to
which is not executable and causes egglog to panic. In the proposed IR, it will first be lowered to
After that, a canonicalization pass will substitute x with y in the rule, and then substitute y for 1:
One way around this is to eliminate the ordered-ness of the NormFact and use ConstrainEqLit, etc. to model an equality check similar to The proposed IR is freer in the sense that it gives you a bigger space of expressible programs.
In
This indirection introduced by variable Consider another example:
In
Again, there is no way of expressing the same query efficiently in
A canonicalizing pass will substitute z1 with z2 and give you
which directly corresponds to an efficient join. Oliver had a good point that for proof generation, you may need to handle literals specially, so with A uniform view of atoms is useful for query compilation. (This point is less relevant to NormFact IR but I think it's very cool.) In the proposed IR, a query is just a vec of atoms, which abstracts over both functions and primitives. This uniform view is useful for further optimizations in query compilation. For example, consider
In the proposed IR, it corresponds to two atoms
The only difference between
The current query compiler, however, will go through each z in R[x, y] and do the check afterward. Query compilation Finally, compilation from |
…-based-type-inference
- Remove commented out code - Remove "both defined error"
Update: All the tests are passing! Some big TODOs left (maybe for future PR):
|
As far as I can tell, this PR doesn't replace |
Is the type information from Are you okay with merging it while two typecheckers are present? If not, I'm happy to put more work into this PR, but this might make code review more complex than splitting into several PRs of smaller scopes. |
Typechecking information is used by my terms and proofs encoding unfortunately. |
Merging with two typecheckers would raise the question- which one are we going to keep in the future? |
I would say it is very hard to avoid conflicts with something not in the main... (especially since the discussions about terms are not yet conclusive.) My proposal for terms and proofs is that we keep it in a branch and move on. When later we have time (e.g., after eggcc), we can revisit this together, and I can work with you to revive it and have it merged- I'm also very excited about proofs and terms.
It is clear to me that the type checker implemented in this PR will subsume both of the existing type checkers ( |
Yeah, I understand it's quite difficult to avoid conflicts with my branch. Regardless, as you said this typechecker seems better and it makes the backend cleaner. I think your suggestion of keeping both typecheckers for now is a good compromise. I would just ask that you don't remove the old typechecker and that you do not remove NormCommand/NormFact. |
Update: This PR is self-contained and ready for review. The refactoring is not 100% done, but it is passing all the tests which is a milestone. So I propose to consider this PR as is for merging, and I plan to make follow-up PRs. More details:
TODOs for follow-up PRs (will open a tracking issue for them once this is merged).
|
Are the new queries any slower now that we are not doing congruence? |
Oh, it’s congruence on query atoms not congruence in rebuilding’s sense.
This is just an easy and obvious optimization pass. If we are not
generating stupid queries like the one I mentioned there shouldn’t be a
performance issue.
I put it in TODOs because the old type checking code does this optimization.
…On Mon, Sep 18, 2023 at 12:20 PM Oliver Flatt ***@***.***> wrote:
Are the new queries any slower now that we are not doing congruence?
—
Reply to this email directly, view it on GitHub
<#219 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADEKPFDOXRTHOGLY7AXGOOLX3CNGRANCNFSM6AAAAAA4DKMNJI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, there are two big changes in here. (1) It adds a typechecker that is contraint-based. (2) It changes how query compilation works significantly- getting rid of the complex egraph stuff and just flattening the query.
In theory, I support both of these changes. But as they are implemented in this PR, I don't think it's worth merging.
(1) The contraint-based typechecker is good, but happens too late to matter. I'm starting to think there is no point in merging it now and forcing you to replace the current typechecker with it. I think it will be easier for both of us if you replace the existing typechecking.rs
with this as a separate PR.
(2) I like that this simplifies and improves query compilation, but it seems to re-invent the wheel. Most of what query compilation in this PR does is flatten things into atoms, which is exactly what desugaring already does. In fact, if you reuse the NormFact
struct it basically already captures what you are doing here.
Again, great work on this PR! I think with more work this will be a big improvement to egglog.
Let me know what you think of my suggestions
src/ast/desugar.rs
Outdated
@@ -424,14 +424,16 @@ pub struct Desugar { | |||
|
|||
impl Default for Desugar { | |||
fn default() -> Self { | |||
let mut type_info = TypeInfo::default(); | |||
type_info.add_primitive(ValueEq {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add this primitive in the implementation of default for TypeInfo
src/typecheck.rs
Outdated
.iter() | ||
.skip(1) | ||
.map(|va| Atom { | ||
head: "value-eq".into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using value-eq
here? Value-eq is a primitive, not a equality constrain that gj ,knows how to handle
src/typecheck.rs
Outdated
fn flatten_fact(&mut self, fact: &Fact) -> Query<Symbol> { | ||
match fact { | ||
Fact::Eq(exprs) => { | ||
// TODO: currently we require exprs.len() to be 2 and Eq atom has the form (= e1 e2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make this requirement explicit in the long term, it just helps simplify the code
src/typecheck.rs
Outdated
} | ||
}, | ||
_ => continue, | ||
fn flatten_fact(&mut self, fact: &Fact) -> Query<Symbol> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this basically what desugaring already does? It seems like if you just use the existing NormFact
you will be much closer to what you want.
src/typecheck.rs
Outdated
loop { | ||
let mut to_subst = None; | ||
for atom in result_rule.body.atoms.iter() { | ||
if atom.head == "value-eq".into() && atom.args[0] != atom.args[1] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this special case- instead of using the primitive value-eq
, equality constraints should be specially represented in the data structure. For that reason, I have equality constraints in the NormFact
data structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you still insist on having equality constraints as part of the query definition for CoreRule after our discussion. I'm open to changing this, although having value-eq
as a regular primitive does simplify things a bit (e.g., fewer cases to consider, no special handling in codegen)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still think we shouldn't have value-eq
, see my comment above.
src/typecheck.rs
Outdated
head: actions.to_vec(), | ||
body: facts.to_vec(), | ||
}; | ||
// dbg!(&rule); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stray comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great review @oflatt and sorry for getting back late. Besides addressing some of your comments, I also made the following changes according to our discussion:
- Typechecking (for queries at least) happens only once now (in
typechecking.rs
), the backend CoreRule directly uses the type-checking result from it instead of redoing it. - The backend takes a
NormRule
instead of a pair ofQuery
andAction
now, so we don't need to convert aNormRule
first to an un-desugaredRule
and then toCoreRule
. - Many codes are removed: We no longer need the old type-checking algorithm, we don't have to convert from un-desugared
Rule
toCoreRule
, etc.
I am very happy about all of these changes
src/typecheck.rs
Outdated
} | ||
|
||
#[derive(Debug, Clone)] | ||
pub enum ResolvedSymbol { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like ResolvedSymbol
as its use in ResolvedCoreRule
contrasts with Symbol
used in UnresolvedCoreRule
. But I'm open to change
src/typecheck.rs
Outdated
loop { | ||
let mut to_subst = None; | ||
for atom in result_rule.body.atoms.iter() { | ||
if atom.head == "value-eq".into() && atom.args[0] != atom.args[1] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you still insist on having equality constraints as part of the query definition for CoreRule after our discussion. I'm open to changing this, although having value-eq
as a regular primitive does simplify things a bit (e.g., fewer cases to consider, no special handling in codegen)
Before I do my review, I just want to say amazing work @yihozhang and we make a great team! This is a big moment for egglog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did an initial review, plan to make another pass after you respond.
Looks like great progress.
@@ -515,6 +515,7 @@ impl ToSexp for RunConfig { | |||
|
|||
#[derive(Clone, Debug, PartialEq, Eq, Hash)] | |||
pub struct NormRunConfig { | |||
pub ctx: CommandId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain why we have a ctx here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because of #236. Every run-schedule
is not sufficient. We need to make sure each run
has a context (for typechecking its until
field.
Err(error) => errors.push(error), | ||
} | ||
} | ||
// If update is successful for only one sub constraint, then we have nailed down the only true constraint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought Xor was "exclusive or" so it needs to succeed for exactly one constraint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right about the semantics of exclusive or
. However, here, Xor succeeds with no updates
basically means "I'm not gonna throw an error and will wait for more updates from other constraints until it's clear which sub-constraint is the right one".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks
src/constraint.rs
Outdated
&self, | ||
type_info: &TypeInfo, | ||
) -> Result<Vec<Constraint<AtomTerm, ArcSort>>, TypeError> { | ||
let mut xor_constraints: Vec<Vec<Constraint<AtomTerm, ArcSort>>> = vec![]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you leave more comments in complex code like this?
I'm not sure why xor_constraints
has this type or why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/constraint.rs
Outdated
1 => Ok(literal_constraints | ||
.chain(xor_constraints.pop().unwrap().into_iter()) | ||
.collect()), | ||
_ => Ok(literal_constraints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does this case happen and why is is Xor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments. Basically, xor_constraints
collects all possible instantiations of the atom. Each instantiation is a vector of constraints. Among all instantiations, only one should hold, thus exclusive or.
src/typecheck.rs
Outdated
} | ||
|
||
#[derive(Debug, Clone)] | ||
pub enum ResolvedSymbol { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Symbol
is used in a lot of places, where here you really mean a function call or primitive. I still think we should change it.
Is this supposed to include type information in the future?
src/typecheck.rs
Outdated
loop { | ||
let mut to_subst = None; | ||
for atom in result_rule.body.atoms.iter() { | ||
if atom.head == "value-eq".into() && atom.args[0] != atom.args[1] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still think we shouldn't have value-eq
, see my comment above.
src/typecheck.rs
Outdated
} | ||
} | ||
|
||
impl Query<Symbol> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename typecheck.rs
to something like query-compile.rs
?
We should also move all the typechecking related stuff like this into typechecking.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made an issue for this (#268). I'll do it after the PR is merged but not now to make the diff clean.
src/typecheck.rs
Outdated
v.insert(Expr::Var(*var)); | ||
} | ||
// TODO: should this be in ResolvedCoreRule | ||
pub fn canonicalize(&self, rule: UnresolvedCoreRule) -> UnresolvedCoreRule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another place where it would be good to add documentation- what's the purpose of canonicalize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I have documented its purpose.
/// Transformed a UnresolvedCoreRule into a CanonicalizedCoreRule.
/// In particular, it removes equality checks between variables and
/// other arguments, and turns equality checks between non-variable arguments
/// into a primitive equality check `value-eq`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, value-eq
cannot be completely removed from our codebase since queries may contain equality checks between e.g., global variables, which cannot be compiled away.
if let ENode::Func(_, children) | ENode::Prim(_, children) = &mut node { | ||
for child in children { | ||
*child = self.unionfind.find(*child); | ||
pub(crate) fn resolve_rule( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More typechecking stuff that probably should be in typechecking.rs
@@ -259,8 +266,25 @@ impl TypeInfo { | |||
} | |||
|
|||
fn typecheck_facts(&mut self, ctx: CommandId, facts: &Vec<NormFact>) -> Result<(), TypeError> { | |||
for fact in facts { | |||
self.typecheck_fact(ctx, fact)?; | |||
// ROUND TRIP TO CORE RULE AND BACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good place for a TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work Yihong!
Err(error) => errors.push(error), | ||
} | ||
} | ||
// If update is successful for only one sub constraint, then we have nailed down the only true constraint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks
This PR proposes a constraint-based global type inference algorithm for egglog. More importantly, it proposes a core IR framework for writing rule analysis and transformation passes, which facilitates future development.
In terms of type inference:
vec-add : vec<a> -> vec<a> -> vec<a>
and overloaded functions likeadd
, which has several definitions such asf64 -> f64 -> f64
andString -> String -> String
.In terms of the core IR:
typecheck.rs
is refactored into several orthogonal passes:Rule
intoCoreRule
,Program
(not currently implemented).CoreRule
has two parts:Query
andActions
.Query
uses a conjunctive query--based IR, whileActions
uses an imperative SSA-basedNormAction
, which is currently used in desugaring. The goal is to gradually phase outNormExpr
oncedesugar.rs
is refactored into several compiler passes over core IR, so that we have a unified internal IR for query analysis and transformation.