Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor origin in tree node #1189

Conversation

ericvergnaud
Copy link
Contributor

There is a need to transpile SQL comments. This requires populating TreeNode.origin when building the ir AST.

This PR:

  • moves the origin field of TreeNode to a 2nd parameter list such that it is ignored during comparisons, and refactors accordingly

Progresses #869

Future PRs will expand this one (see #1182)

Copy link
Contributor

@vil1 vil1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks better than the previous encoding with multiple parameter lists.

However, with this new encoding, we won't be able to change a node's origin after creation. That means that every visit* method in our Builder classes will have to properly set the origin.

Again, I suggest we go for something like

trait TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
  private [this] var _origin: Origin = Origin.empty

  def origin: Origin = _origin

  def withOrigin(updatedOrigin: Origin): BaseType = {
    _origin = updatedOrigin
    self 
  }
}

so that we could at least have creation of IR nodes and call to their withOrigin method happen at different times

Comment on lines 15 to 20
startLine: Option[Int] = None,
startColumn: Option[Int] = None,
endLine: Option[Int] = None,
endColumn: Option[Int] = None,
startTokenIndex: Option[Int] = None,
endTokenIndex: Option[Int] = None)
Copy link
Contributor

@vil1 vil1 Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible/meaningful for some of the fields to be Some while others are None, or is it either all fields None or all fields Some ?

In the latter case, we would be better to either define the origin field as an Option[Origin] or, even better, define Origin as

sealed trait Origin
case object UnknownOrigin extends Origin
case class InputOrigin(
    startLine: Int,
    startColumn: Int,
    endLine: Int,
    endColumn: Int,
    startTokenIndex: Int,
    endTokenIndex: Int) extends Origin

// optionally, adding the following case for the nodes that are synthesized during optimization phase
// could be handy.
case class SyntheticOrigin(synthesizedBy: String) extends Origin

Copy link
Contributor

@jimidle jimidle Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot think of a situation where we do not have all of these if we have any of them at all. I think I even add this stuff to the manufactured error tokens. That does not mean I am correct though - just that I can't think of when we might not have some piece of that puzzle ;)

If we add teh withOrigin stuff from Spark, then we can do this:

override def visitPrimitiveDataType(ctx: PrimitiveDataTypeContext): DataType = withOrigin(ctx) {

@jimidle
Copy link
Contributor

jimidle commented Nov 12, 2024

@ericvergnaud
Copy link
Contributor Author

This looks better than the previous encoding with multiple parameter lists.

However, with this new encoding, we won't be able to change a node's origin after creation. That means that every visit* method in our Builder classes will have to properly set the origin.

Again, I suggest we go for something like

trait TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
  private [this] var _origin: Origin = Origin.empty

  def origin: Origin = _origin

  def withOrigin(updatedOrigin: Origin): BaseType = {
    _origin = updatedOrigin
    self 
  }
}

so that we could at least have creation of IR nodes and call to their withOrigin method happen at different times

Can you provide a scenario where this would be both useful and feasible ? I can't think of one. The only moment we can provide the origin is during conversion of ParserRuleContext nodes to ir nodes. Using 2 separate calls seems error prone (it's very easy to forget to call withOrigin). In scenarios where we don't have an origin, we should simply supply Origin.empty at ctor time.

@@ -4,7 +4,8 @@ import java.util.{UUID}

// Expression used to refer to fields, functions and similar. This can be used everywhere
// expressions in SQL appear.
abstract class Expression extends TreeNode[Expression] {
abstract class Expression(_origin: Option[Origin] = Option.empty) extends TreeNode[Expression](_origin) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericvergnaud
Copy link
Contributor Author

superseded by #1199\

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants