-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Certain Python execution plans aren't captured on Databricks #278
Comments
Please verify that the Spline agent is enabled and listens to Spark event.
|
Here is what I am seeing in the Driver logs, and it looks like similar to what you mentioned above: Please note that I am saving the transformed files in CSV format, is that the reason why the lineage is not showing up? Should I save files in Parquet format instead? |
No, the format should not matter. spline.lineageDispatcher=logging |
Should I enable it in the Spark config, or add it at the beginning of the notebook? And, where would this logging information be written if enabled? |
In the Spark config, the same place where the rest of Spline settings are put. A lineage JSON should be printed to the Spark logs - the same logs where you've seen those Spline INFO messages |
What Spark version are you using? |
Spark 3.1.1 and Scala 2.12. Please note that I'm writing the CSV file using Python. I should add that the example Scala code given in the Spline Databricks guide works perfectly. |
Ok, we'll investigate.
|
I have emailed you, please check. |
I was able to (sort of) reproduced the issue. I was playing with different code snippets both on Databricks and my local PC, and according to my observations the issue is specific to Databricks+Python+"no transformations" combination. df = spark.table('dummy1')
df.write.mode('overwrite').format('csv').options(header='true', quote='"', quoteMode='all').save('/data/dummy2.csv') Adding any transformation operation, event a dummy filter after the read like this On pure PySpark it works for all cases. |
The notebook I have is SQL, so the Transformations are happening in SQL before writing the results in Python. Do you think I should try with writing the CSVs in Scala? What's is the best solution for this issue in your opinion? |
For some reason I cannot reproduce it anymore. Maybe it actually worked to me all the time, but the logs were slow to update, that's why I didn't see the captured events. I'm not sure :\ At this point I cannot suggest anything in particular, Try to use different Databricks runtime versions, and if that doesn't help, try performing the write in Scala. |
Which Databricks runtime do you recommend for Spline 0.6.1 version? I'm using 8.2 currently. |
I tested on 8.3 |
Is this syntax correct? The Spark Config text box takes only Key-Value pairs (else, an error is detected). Anyway, I have written the below parameters for Spark Config but the
|
I have now tried writing files in Parquet format, still no sign of the lineage being captured. |
What does this error mean? It seems to be related to the Spline agent installed in Databricks (this is for Runtime 8.0, includes Apache Spark 3.1.1, Scala 2.12):
|
It might be the cause. |
As I understand, the Spline UI shows the execution events once we write to a file. I am writing to a CSV file using the below code but Spline UI is not showing anything. There are several cells of SQL transformation happening before writing to CSV, but nothing is getting captured. Anyway to resolve this issue?
The text was updated successfully, but these errors were encountered: