Skip to content

Commit

Permalink
Merge pull request #345 from capitalone/develop
Browse files Browse the repository at this point in the history
Release v0.14.2
  • Loading branch information
fdosani authored Oct 30, 2024
2 parents 0e15a75 + 39ef330 commit e7cd7e3
Show file tree
Hide file tree
Showing 12 changed files with 2,899 additions and 9 deletions.
18 changes: 14 additions & 4 deletions .github/workflows/test-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,17 +64,27 @@ jobs:
java-version: '8'
distribution: 'adopt'

- name: Install Spark and datacompy
- name: Install Spark, Pandas, and Numpy
run: |
python -m pip install --upgrade pip
python -m pip install pytest pytest-spark pypandoc
python -m pip install pyspark[connect]==${{ matrix.spark-version }}
python -m pip install pandas==${{ matrix.pandas-version }}
python -m pip install numpy==${{ matrix.numpy-version }}
- name: Install Datacompy without Snowflake/Snowpark if Python 3.12
if: ${{ matrix.python-version == '3.12' }}
run: |
python -m pip install .[dev_no_snowflake]
- name: Install Datacompy with all dev dependencies if Python 3.9, 3.10, or 3.11
if: ${{ matrix.python-version != '3.12' }}
run: |
python -m pip install .[dev]
- name: Test with pytest
run: |
python -m pytest tests/
python -m pytest tests/ --ignore=tests/test_snowflake.py
test-bare-install:

Expand All @@ -101,7 +111,7 @@ jobs:
python -m pip install .[tests]
- name: Test with pytest
run: |
python -m pytest tests/
python -m pytest tests/ --ignore=tests/test_snowflake.py
test-fugue-install-no-spark:

Expand All @@ -127,4 +137,4 @@ jobs:
python -m pip install .[tests,duckdb,polars,dask,ray]
- name: Test with pytest
run: |
python -m pytest tests/
python -m pytest tests/ --ignore=tests/test_snowflake.py
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ pip install datacompy[spark]
pip install datacompy[dask]
pip install datacompy[duckdb]
pip install datacompy[ray]
pip install datacompy[snowflake]

```

Expand Down Expand Up @@ -95,6 +96,7 @@ with the Pandas on Spark implementation. Spark plans to support Pandas 2 in [Spa
- Pandas: ([See documentation](https://capitalone.github.io/datacompy/pandas_usage.html))
- Spark: ([See documentation](https://capitalone.github.io/datacompy/spark_usage.html))
- Polars: ([See documentation](https://capitalone.github.io/datacompy/polars_usage.html))
- Snowflake/Snowpark: ([See documentation](https://capitalone.github.io/datacompy/snowflake_usage.html))
- Fugue is a Python library that provides a unified interface for data processing on Pandas, DuckDB, Polars, Arrow,
Spark, Dask, Ray, and many other backends. DataComPy integrates with Fugue to provide a simple way to compare data
across these backends. Please note that Fugue will use the Pandas (Native) logic at its lowest level
Expand Down
4 changes: 3 additions & 1 deletion datacompy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
Then extended to carry that functionality over to Spark Dataframes.
"""

__version__ = "0.14.1"
__version__ = "0.14.2"

import platform
from warnings import warn
Expand All @@ -43,12 +43,14 @@
unq_columns,
)
from datacompy.polars import PolarsCompare
from datacompy.snowflake import SnowflakeCompare
from datacompy.spark.sql import SparkSQLCompare

__all__ = [
"BaseCompare",
"Compare",
"PolarsCompare",
"SnowflakeCompare",
"SparkSQLCompare",
"all_columns_match",
"all_rows_overlap",
Expand Down
4 changes: 3 additions & 1 deletion datacompy/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -799,6 +799,7 @@ def columns_equal(
A series of Boolean values. True == the values match, False == the
values don't match.
"""
default_value = "DATACOMPY_NULL"
compare: pd.Series[bool]

# short circuit if comparing mixed type columns. We don't want to support this moving forward.
Expand Down Expand Up @@ -842,7 +843,8 @@ def columns_equal(
compare = compare_string_and_date_columns(col_1, col_2)
else:
compare = pd.Series(
(col_1 == col_2) | (col_1.isnull() & col_2.isnull())
(col_1.fillna(default_value) == col_2.fillna(default_value))
| (col_1.isnull() & col_2.isnull())
)
except Exception:
# Blanket exception should just return all False
Expand Down
Loading

0 comments on commit e7cd7e3

Please sign in to comment.