You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consider this code merging data with a generated column:
# Databricks notebook sourcetableName="TARGET_SCHEMA.generatedTableTest"# COMMAND ----------spark.sql(f"DROP TABLE IF EXISTS {tableName}")
# COMMAND ----------importwarningsfrompysparkimportpandasaspsfrompyspark.pandas.utilsimportPandasAPIOnSparkAdviceWarningwarnings.simplefilter("ignore", category=PandasAPIOnSparkAdviceWarning)
# COMMAND ----------df=ps.DataFrame({"foo": [1,2,3,4,5], "bar":[6,7,8,9,0]})
df.display()
# COMMAND ----------fromdelta.tablesimportDeltaTablefrompyspark.sql.typesimportLongTypedeltaSession=DeltaTable.create(spark)
dTableBuilder=deltaSession.tableName(tableName)
dTableBuilder.addColumns(df.to_spark().schema)
dTableBuilder.addColumn("baz", LongType(), generatedAlwaysAs="foo + bar")
dTable=dTableBuilder.execute()
# COMMAND ----------mergeBuilder=dTable.merge(df.to_spark(), condition="1 = 1").whenMatchedUpdateAll().whenNotMatchedInsertAll()
# COMMAND ----------try:
mergeBuilder.execute()
exceptExceptionase:
print("***We raised an error! As of 20240627 this will say 'baz' is missing***\n\n")
print(e)
Observed results
The merge fails, unable to resolve the generated column. The error will be (or close to)
[DELTA_MERGE_UNRESOLVED_EXPRESSION] Cannot resolve baz in UPDATE clause given columns foo, bar.
Expected results
The generated column is, well, generated from the inputs and as such is unnecessary to specify.
Environment information
Delta Lake version: DBR 14.3LTS
Spark version: 3.5.0
Scala version: 2.12
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?
Yes. I can contribute a fix for this bug independently.
Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
No. I cannot contribute a bug fix at this time.
The text was updated successfully, but these errors were encountered:
You can workaround this by enumerating every non-generated column in the "All" functions, but that kind of misses the point of both those functions and generated columns IMO. If a column is missing from the source and the target is generated, it should be skipped during validation.
Consider this code merging data with a generated column:
Observed results
The merge fails, unable to resolve the generated column. The error will be (or close to)
[DELTA_MERGE_UNRESOLVED_EXPRESSION] Cannot resolve baz in UPDATE clause given columns foo, bar.
Expected results
The generated column is, well, generated from the inputs and as such is unnecessary to specify.
Environment information
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?
The text was updated successfully, but these errors were encountered: