-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release v0.13.0 #315
Release v0.13.0 #315
Conversation
Co-authored-by: fdosani <fdosani@users.noreply.github.com>
* spark clean up * fixing spark session weirdness with parameters
Co-authored-by: fdosani <fdosani@users.noreply.github.com>
* adding in benchmark docs * Update docs/source/benchmark.rst Co-authored-by: Jacob Dawang <jdawang@users.noreply.github.com> --------- Co-authored-by: Jacob Dawang <jdawang@users.noreply.github.com>
Co-authored-by: fdosani <fdosani@users.noreply.github.com>
Co-authored-by: fdosani <fdosani@users.noreply.github.com>
* [WIP] vanilla spark * [WIP] fixing tests and logic * [WIP] __index cleanup * updating pyspark.sql logic and fixing tests * restructuring spark logic into submodule and typing * remove pandas 2 restriction for spark sql * fix for sql call * updating docs * updating benchmarks with pyspark dataframe * relative imports and linting * relative imports and linting * feedback from review, switch to monotonic and simplify checks * allow pyspark.sql.connect.dataframe.DataFrame * checking version for spark connect * typo fix * adding import * adding connect extras * adding connect extras
Hi @fdosani , does SparkSQLCompare writes to the DB while processing the data? When I tried it on the databricks serverless compute for job, I get following error: [NOT_SUPPORTED_WITH_SERVERLESS] PERSIST TABLE is not supported on serverless compute. SQLSTATE: 0A000 DataFrame.persist() is shown in the call stack. Looks like databricks serverless compute has lots of limitations. Anyway SparkSQLCompare works fine for normal databricks compute cluster so this is non-issue for us. We may not use databricks serverless in near future due to such limitations. |
So there are a couple of calls to cache which is maybe it? Yeah I'm not sure I want to support serverless as it would restrict most of the code we have it seems. |
right, serverless will have lots of restrictions and not a common use case. Current support is great. Thanks again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Release v0.13.0
Hey, I think you should support Databricks Serverless :) The error above is because .cache() is not supported. But it's not supported because it's not needed. The goal of Serverless is to provide best performance out of the box without the user tuning anything. I tested and it works, I'll create a pull request to support it. PS: I work for Databricks |
Hey @mattiazenidb I'll happily accept a PR for that. Thanks for helping out here! |
Release v0.13.0
New
SparkSQLCompare
Bug fix
General cleanup and house keeping
spark clean up #305
edgetest bumps