Release v0.13.0 #315

fdosani · 2024-06-20T19:21:51Z

Release v0.13.0

New

SparkSQLCompare and spark submodule #310
- new Compare class called SparkSQLCompare
- renamed SparkCompare -> SparkPandasCompare
- reorganized all the spark logic into a submodule spark
adding in benchmark docs #309

Bug fix

bug fix and clean up of temp_column_name #308

General cleanup and house keeping

Co-authored-by: fdosani <fdosani@users.noreply.github.com>

* spark clean up * fixing spark session weirdness with parameters

Co-authored-by: fdosani <fdosani@users.noreply.github.com>

* adding in benchmark docs * Update docs/source/benchmark.rst Co-authored-by: Jacob Dawang <jdawang@users.noreply.github.com> --------- Co-authored-by: Jacob Dawang <jdawang@users.noreply.github.com>

Co-authored-by: fdosani <fdosani@users.noreply.github.com>

* [WIP] vanilla spark * [WIP] fixing tests and logic * [WIP] __index cleanup * updating pyspark.sql logic and fixing tests * restructuring spark logic into submodule and typing * remove pandas 2 restriction for spark sql * fix for sql call * updating docs * updating benchmarks with pyspark dataframe * relative imports and linting * relative imports and linting * feedback from review, switch to monotonic and simplify checks * allow pyspark.sql.connect.dataframe.DataFrame * checking version for spark connect * typo fix * adding import * adding connect extras * adding connect extras

satniks · 2024-06-21T06:40:52Z

Hi @fdosani , does SparkSQLCompare writes to the DB while processing the data? When I tried it on the databricks serverless compute for job, I get following error:

[NOT_SUPPORTED_WITH_SERVERLESS] PERSIST TABLE is not supported on serverless compute. SQLSTATE: 0A000

DataFrame.persist() is shown in the call stack.

Looks like databricks serverless compute has lots of limitations.

Anyway SparkSQLCompare works fine for normal databricks compute cluster so this is non-issue for us. We may not use databricks serverless in near future due to such limitations.

fdosani · 2024-06-21T12:47:17Z

Hi @fdosani , does SparkSQLCompare writes to the DB while processing the data? When I tried it on the databricks serverless compute for job, I get following error:

[NOT_SUPPORTED_WITH_SERVERLESS] PERSIST TABLE is not supported on serverless compute. SQLSTATE: 0A000

DataFrame.persist() is shown in the call stack.

Looks like databricks serverless compute has lots of limitations.

Anyway SparkSQLCompare works fine for normal databricks compute cluster so this is non-issue for us. We may not use databricks serverless in near future due to such limitations.

So there are a couple of calls to cache which is maybe it? Yeah I'm not sure I want to support serverless as it would restrict most of the code we have it seems.

satniks · 2024-06-21T13:19:45Z

So there are a couple of calls to cache which is maybe it? Yeah I'm not sure I want to support serverless as it would restrict most of the code we have it seems.

right, serverless will have lots of restrictions and not a common use case. Current support is great. Thanks again.

gladysteh99

LGTM!

Release v0.13.0

mattiazenidb · 2024-09-14T18:00:25Z

Hey, I think you should support Databricks Serverless :) The error above is because .cache() is not supported. But it's not supported because it's not needed. The goal of Serverless is to provide best performance out of the box without the user tuning anything.

I tested and it works, I'll create a pull request to support it.

PS: I work for Databricks

fdosani · 2024-09-14T18:48:15Z

Hey, I think you should support Databricks Serverless :) The error above is because .cache() is not supported. But it's not supported because it's not needed. The goal of Serverless is to provide best performance out of the box without the user tuning anything.

I tested and it works, I'll create a pull request to support it.

PS: I work for Databricks

Hey @mattiazenidb I'll happily accept a PR for that. Thanks for helping out here!

github-actions bot and others added 8 commits May 27, 2024 09:53

[edgetest] automated change (#304)

f8235a9

Co-authored-by: fdosani <fdosani@users.noreply.github.com>

spark clean up (#305)

b133a2f

* spark clean up * fixing spark session weirdness with parameters

[edgetest] automated change (#307)

8608830

Co-authored-by: fdosani <fdosani@users.noreply.github.com>

bug fix and clean up of temp_column_name (#308)

1fd8ac1

adding in benchmark docs (#309)

5861594

* adding in benchmark docs * Update docs/source/benchmark.rst Co-authored-by: Jacob Dawang <jdawang@users.noreply.github.com> --------- Co-authored-by: Jacob Dawang <jdawang@users.noreply.github.com>

[edgetest] automated change (#311)

0292567

Co-authored-by: fdosani <fdosani@users.noreply.github.com>

[edgetest] automated change (#313)

afeed8b

Co-authored-by: fdosani <fdosani@users.noreply.github.com>

fdosani requested review from ak-gupta, jdawang and gladysteh99 as code owners June 20, 2024 19:21

Merge branch 'main' into develop

3b91429

fdosani mentioned this pull request Jun 21, 2024

Benchmark Documentation between pandas, fugue, and native spark. #254

Closed

gladysteh99 approved these changes Jun 21, 2024

View reviewed changes

fdosani merged commit d02ac44 into main Jun 21, 2024
58 checks passed

rhaffar pushed a commit to rhaffar/datacompy that referenced this pull request Sep 12, 2024

Merge pull request capitalone#315 from capitalone/develop

b5cff08

Release v0.13.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v0.13.0 #315

Release v0.13.0 #315

fdosani commented Jun 20, 2024

satniks commented Jun 21, 2024

fdosani commented Jun 21, 2024

satniks commented Jun 21, 2024

gladysteh99 left a comment

mattiazenidb commented Sep 14, 2024

fdosani commented Sep 14, 2024

Release v0.13.0 #315

Release v0.13.0 #315

Conversation

fdosani commented Jun 20, 2024

Release v0.13.0

New

Bug fix

General cleanup and house keeping

satniks commented Jun 21, 2024

fdosani commented Jun 21, 2024

satniks commented Jun 21, 2024

gladysteh99 left a comment

Choose a reason for hiding this comment

mattiazenidb commented Sep 14, 2024

fdosani commented Sep 14, 2024