Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for pandas version 2 #237

Merged
merged 4 commits into from
Oct 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ jobs:
- name: Checkout
uses: actions/checkout@v4

- name: Install dependencies
- name: Install
run: |
python -m pip install --upgrade pip
pip install pylint pytest pytest-cov
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
pip install --upgrade pip
pip install .[test]
pip install pylint

- name: pylint
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ['3.8', '3.9', '3.10', '3.11']
python-version: ['3.9', '3.10', '3.11', '3.12']

steps:
- name: Checkout
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
name: Test Python312
name: Test Python313

on: workflow_dispatch

Expand All @@ -11,7 +11,7 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ['3.12']
python-version: ['3.13']

steps:
- name: Checkout
Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Changelog

## Version 0.4.7 (Oct 22, 2024)

Changes:
* Add support for Pandas version 2 ([#237](https://github.com/AI-SDC/ACRO/pull/237))

## Version 0.4.6 (Jun 25, 2024)

Changes:
Expand Down
7 changes: 4 additions & 3 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
cff-version: 1.2.0
title: ACRO
version: 0.4.6
doi: 10.5281/zenodo.12535291
date-released: 2024-06-25
version: 0.4.7
doi:
date-released: 2024-10-22
license: MIT
repository-code: https://github.com/AI-SDC/ACRO
languages:
Expand All @@ -13,6 +13,7 @@ keywords:
- privacy
- privacy tools
- statistical disclosure control
- statistical software
authors:
- family-names: Preen
given-names: Richard John
Expand Down
29 changes: 12 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,22 +8,16 @@

This repository holds the Python ACRO package. An R wrapper package is available: [ACRO-R](https://github.com/AI-SDC/ACRO-R).

ACRO (Automatic Checking of Research Outputs) is an open source
tool for automating the statistical disclosure control (SDC) of research
outputs. ACRO assists researchers and output checkers by distinguishing between
research output that is safe to publish, output that requires further analysis,
and output that cannot be published because of substantial disclosure risk.

It does this by providing a light-weight 'skin' that sits over well-known
analysis tools, in a variety of languages researchers might use. This adds
functionality to:

* identify potentially disclosive outputs against a range of commonly used
disclosure tests;
A GUI for viewing and approving outputs is also available: [SACRO-Viewer](https://github.com/AI-SDC/SACRO-Viewer)

ACRO (Automatic Checking of Research Outputs) is an open source tool for automating the [statistical disclosure control](https://en.wikipedia.org/wiki/Statistical_disclosure_control) (SDC) of research outputs. ACRO assists researchers and output checkers by distinguishing between research output that is safe to publish, output that requires further analysis, and output that cannot be published because of a substantial risk of disclosing private data.

It does this by providing a lightweight 'skin' that sits over well-known analysis tools, in a variety of languages researchers might use. This adds functionality to:

* identify potentially disclosive outputs against a range of commonly used disclosure tests;
* suppress outputs where required;
* report reasons for suppression;
* produce simple summary documents TRE staff can use to streamline their
workflow.
* produce simple summary documents TRE staff can use to streamline their workflow.

![ACRO workflow and architecture schematic](docs/schematic.png)

Expand All @@ -37,15 +31,16 @@ If installed in this way, the example [notebooks](notebooks) and the [data](data
$ pip install acro
```

#### Notes for Python 3.12
#### Notes for Python 3.13

ACRO currently depends on an older version of Pandas (~1.5.0) for which no pre-compiled wheels are available within pip for Python 3.12. Therefore, in this scenario, Pandas must be built from source. This requires the installation of a C++ compiler before pip installing acro.
ACRO currently depends on numpy version 1.x.x for which no pre-compiled wheels are available within pip for Python 3.13. Therefore, in this scenario, numpy must be built from source. This requires the installation of a C++ compiler before pip installing acro.

For Windows, [Microsoft Visual Studio](https://visualstudio.microsoft.com/downloads/) and the [C++ build tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) will likely need to be installed first.
For Windows, the [Microsoft Visual Studio C++ build tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) will likely need to be installed first.

### Examples

See the example notebooks for:

* [Python charities dataset](notebooks/test.ipynb)
* [Python nursery dataset](notebooks/test-nursery.ipynb)
* [R charities dataset](https://ai-sdc.github.io/ACRO/_static/test.nb.html)
Expand Down
3 changes: 3 additions & 0 deletions acro/acro_tables.py
Original file line number Diff line number Diff line change
Expand Up @@ -1512,6 +1512,9 @@ def crosstab_with_totals( # pylint: disable=too-many-arguments,too-many-locals
normalize=normalize,
)

if table.empty:
raise ValueError("empty table")

table, _ = delete_empty_rows_columns(table)
masks = create_crosstab_masks(
index_new,
Expand Down
2 changes: 1 addition & 1 deletion acro/version.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""ACRO version number."""

__version__ = "0.4.6"
__version__ = "0.4.7"
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# -- Project information -----------------------------------------------------

project = "ACRO"
copyright = "2023, ACRO Project Team"
copyright = "2024, ACRO Project Team"
author = "ACRO Project Team"
release = __version__

Expand Down
7 changes: 0 additions & 7 deletions requirements.txt

This file was deleted.

11 changes: 6 additions & 5 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = acro
version = 0.4.6
version = 0.4.7
description = ACRO: Tools for the Automatic Checking of Research Outputs
long_description = file: README.md
long_description_content_type = text/markdown
Expand All @@ -10,16 +10,16 @@ maintainer_email = james.smith@uwe.ac.uk
license = MIT
license_files = LICENSE.md
classifiers =
Development Status :: 3 - Alpha
Development Status :: 4 - Beta
Intended Audience :: Developers
Intended Audience :: Science/Research
License :: OSI Approved :: MIT License
Natural Language :: English
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
Programming Language :: Python :: 3.10
Programming Language :: Python :: 3.11
Programming Language :: Python :: 3.12
Programming Language :: Python :: 3.13
Topic :: Scientific/Engineering
Topic :: Scientific/Engineering :: Information Analysis
Operating System :: OS Independent
Expand All @@ -29,14 +29,15 @@ keywords =
privacy
privacy-tools
statistical-disclosure-control
statistical-software
project_urls =
Changelog = https://github.com/AI-SDC/ACRO/CHANGELOG.md
Documentation = https://github.com/AI-SDC/ACRO/wiki
Bug Tracker = https://github.com/AI-SDC/ACRO/issues
Discussions = https://github.com/AI-SDC/ACRO/discussions

[options]
python_requires = >=3.8
python_requires = >=3.9
zip_safe = False
include_package_data = True
packages = find:
Expand All @@ -45,7 +46,7 @@ install_requires =
matplotlib
numpy<2.0.0
openpyxl
pandas~=1.5.0
pandas>=1.5.0,<2.3
PyYAML
statsmodels

Expand Down
7 changes: 6 additions & 1 deletion test/test_initial.py
Original file line number Diff line number Diff line change
Expand Up @@ -379,7 +379,12 @@ def test_finalise_json(data, acro):
assert orig.summary == read.summary
assert orig.comments == read.comments
assert orig.timestamp == read.timestamp
assert (orig.output[0].reset_index()).equals(read.output[0])
# check SDC outcome DataFrame
orig_df = orig.output[0].reset_index()
read_df = read.output[0]
pd.testing.assert_frame_equal(
orig_df, read_df, check_names=False, check_dtype=False
)
# test reading JSON
with open(os.path.normpath(f"{PATH}/results.json"), encoding="utf-8") as file:
json_data = json.load(file)
Expand Down