-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor origin in tree node #1189
Refactor origin in tree node #1189
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks better than the previous encoding with multiple parameter lists.
However, with this new encoding, we won't be able to change a node's origin after creation. That means that every visit*
method in our Builder
classes will have to properly set the origin.
Again, I suggest we go for something like
trait TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
private [this] var _origin: Origin = Origin.empty
def origin: Origin = _origin
def withOrigin(updatedOrigin: Origin): BaseType = {
_origin = updatedOrigin
self
}
}
so that we could at least have creation of IR nodes and call to their withOrigin
method happen at different times
startLine: Option[Int] = None, | ||
startColumn: Option[Int] = None, | ||
endLine: Option[Int] = None, | ||
endColumn: Option[Int] = None, | ||
startTokenIndex: Option[Int] = None, | ||
endTokenIndex: Option[Int] = None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible/meaningful for some of the fields to be Some
while others are None
, or is it either all fields None
or all fields Some
?
In the latter case, we would be better to either define the origin
field as an Option[Origin]
or, even better, define Origin
as
sealed trait Origin
case object UnknownOrigin extends Origin
case class InputOrigin(
startLine: Int,
startColumn: Int,
endLine: Int,
endColumn: Int,
startTokenIndex: Int,
endTokenIndex: Int) extends Origin
// optionally, adding the following case for the nodes that are synthesized during optimization phase
// could be handy.
case class SyntheticOrigin(synthesizedBy: String) extends Origin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot think of a situation where we do not have all of these if we have any of them at all. I think I even add this stuff to the manufactured error tokens. That does not mean I am correct though - just that I can't think of when we might not have some piece of that puzzle ;)
If we add teh withOrigin stuff from Spark, then we can do this:
override def visitPrimitiveDataType(ctx: PrimitiveDataTypeContext): DataType = withOrigin(ctx) {
Also, don't forget this: https://databricks.slack.com/archives/C070VK7G895/p1730991678494919 |
Can you provide a scenario where this would be both useful and feasible ? I can't think of one. The only moment we can provide the origin is during conversion of ParserRuleContext nodes to ir nodes. Using 2 separate calls seems error prone (it's very easy to forget to call withOrigin). In scenarios where we don't have an origin, we should simply supply Origin.empty at ctor time. |
…such that it can be populated a construction time
@@ -4,7 +4,8 @@ import java.util.{UUID} | |||
|
|||
// Expression used to refer to fields, functions and similar. This can be used everywhere | |||
// expressions in SQL appear. | |||
abstract class Expression extends TreeNode[Expression] { | |||
abstract class Expression(_origin: Option[Origin] = Option.empty) extends TreeNode[Expression](_origin) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is breaking compatibility. reintegrate https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkParserUtils.scala#L27 to use SparkParserUtils#withOrigin
superseded by #1199\ |
There is a need to transpile SQL comments. This requires populating TreeNode.origin when building the ir AST.
This PR:
origin
field ofTreeNode
to a 2nd parameter list such that it is ignored during comparisons, and refactors accordinglyProgresses #869
Future PRs will expand this one (see #1182)