Refactor origin in tree node #1189

ericvergnaud · 2024-11-12T14:56:25Z

There is a need to transpile SQL comments. This requires populating TreeNode.origin when building the ir AST.

This PR:

moves the origin field of TreeNode to a 2nd parameter list such that it is ignored during comparisons, and refactors accordingly

Progresses #869

Future PRs will expand this one (see #1182)

vil1

This looks better than the previous encoding with multiple parameter lists.

However, with this new encoding, we won't be able to change a node's origin after creation. That means that every visit* method in our Builder classes will have to properly set the origin.

Again, I suggest we go for something like

trait TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
  private [this] var _origin: Origin = Origin.empty

  def origin: Origin = _origin

  def withOrigin(updatedOrigin: Origin): BaseType = {
    _origin = updatedOrigin
    self 
  }
}

so that we could at least have creation of IR nodes and call to their withOrigin method happen at different times

vil1 · 2024-11-12T15:47:58Z

core/src/main/scala/com/databricks/labs/remorph/intermediate/trees.scala

+    startLine: Option[Int] = None,
+    startColumn: Option[Int] = None,
+    endLine: Option[Int] = None,
+    endColumn: Option[Int] = None,
+    startTokenIndex: Option[Int] = None,
+    endTokenIndex: Option[Int] = None)


Is it possible/meaningful for some of the fields to be Some while others are None, or is it either all fields None or all fields Some ?

In the latter case, we would be better to either define the origin field as an Option[Origin] or, even better, define Origin as

sealed trait Origin case object UnknownOrigin extends Origin case class InputOrigin( startLine: Int, startColumn: Int, endLine: Int, endColumn: Int, startTokenIndex: Int, endTokenIndex: Int) extends Origin // optionally, adding the following case for the nodes that are synthesized during optimization phase // could be handy. case class SyntheticOrigin(synthesizedBy: String) extends Origin

I cannot think of a situation where we do not have all of these if we have any of them at all. I think I even add this stuff to the manufactured error tokens. That does not mean I am correct though - just that I can't think of when we might not have some piece of that puzzle ;)

If we add teh withOrigin stuff from Spark, then we can do this:

override def visitPrimitiveDataType(ctx: PrimitiveDataTypeContext): DataType = withOrigin(ctx) {

jimidle · 2024-11-12T16:12:46Z

Also, don't forget this:

https://databricks.slack.com/archives/C070VK7G895/p1730991678494919

ericvergnaud · 2024-11-13T13:38:41Z

This looks better than the previous encoding with multiple parameter lists.

However, with this new encoding, we won't be able to change a node's origin after creation. That means that every visit* method in our Builder classes will have to properly set the origin.

Again, I suggest we go for something like
trait TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
  private [this] var _origin: Origin = Origin.empty

  def origin: Origin = _origin

  def withOrigin(updatedOrigin: Origin): BaseType = {
    _origin = updatedOrigin
    self 
  }
}
so that we could at least have creation of IR nodes and call to their withOrigin method happen at different times

Can you provide a scenario where this would be both useful and feasible ? I can't think of one. The only moment we can provide the origin is during conversion of ParserRuleContext nodes to ir nodes. Using 2 separate calls seems error prone (it's very easy to forget to call withOrigin). In scenarios where we don't have an origin, we should simply supply Origin.empty at ctor time.

…such that it can be populated a construction time

nfx · 2024-11-13T14:44:45Z

core/src/main/scala/com/databricks/labs/remorph/intermediate/expressions.scala

@@ -4,7 +4,8 @@ import java.util.{UUID}

 // Expression used to refer to fields, functions and similar. This can be used everywhere
 // expressions in SQL appear.
-abstract class Expression extends TreeNode[Expression] {
+abstract class Expression(_origin: Option[Origin] = Option.empty) extends TreeNode[Expression](_origin) {


this is breaking compatibility. reintegrate https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkParserUtils.scala#L27 to use SparkParserUtils#withOrigin

ericvergnaud · 2024-11-13T14:51:57Z

superseded by #1199\

ericvergnaud added 4 commits November 12, 2024 15:41

refactor Origin

56c0fb2

drop CurrentOrigin

f29386e

make 'TreeNode.origin' field mandatory

03afc1f

formatting and merge issues

5aa0f5f

ericvergnaud mentioned this pull request Nov 12, 2024

Attach SnowFlake line comments to SELECT statements #1182

Closed

vil1 requested changes Nov 12, 2024

View reviewed changes

vil1 reviewed Nov 12, 2024

View reviewed changes

ericvergnaud mentioned this pull request Nov 12, 2024

Attach comments to snowflake select statements #1190

Closed

ericvergnaud added 2 commits November 13, 2024 15:27

make all Origin fields mandatory

40570f8

refactor TreeNode.origin to a method returning a defaulted parameter …

72a4df7

…such that it can be populated a construction time

ericvergnaud mentioned this pull request Nov 13, 2024

Refactor origin in tree node #1199

Closed

nfx requested changes Nov 13, 2024

View reviewed changes

ericvergnaud closed this Nov 13, 2024

jimidle assigned jimidle and ericvergnaud and unassigned jimidle Nov 14, 2024

ericvergnaud mentioned this pull request Nov 15, 2024

[FEATURE]: IR to Listen and Generate Code Comments #869

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor origin in tree node #1189

Refactor origin in tree node #1189

ericvergnaud commented Nov 12, 2024

vil1 left a comment

vil1 Nov 12, 2024 •

edited

Loading

jimidle Nov 12, 2024 •

edited

Loading

jimidle commented Nov 12, 2024

ericvergnaud commented Nov 13, 2024

nfx Nov 13, 2024

ericvergnaud commented Nov 13, 2024

Refactor origin in tree node #1189

Refactor origin in tree node #1189

Conversation

ericvergnaud commented Nov 12, 2024

vil1 left a comment

Choose a reason for hiding this comment

vil1 Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

jimidle Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

jimidle commented Nov 12, 2024

ericvergnaud commented Nov 13, 2024

nfx Nov 13, 2024

Choose a reason for hiding this comment

ericvergnaud commented Nov 13, 2024

vil1 Nov 12, 2024 •

edited

Loading

jimidle Nov 12, 2024 •

edited

Loading