Releases: sdv-dev/SDMetrics
v0.18.0 - 2024-12-13
Bugs Fixed
- Missing whitespace in
DisclosureProtection
warning - Issue #694 by @frances-h DisclosureProtection
should be NaN if baseline score is zero - Issue #693 by @frances-hCategoricalCAP
metric returns 0 if no overlap in known fields - Issue #692 by @frances-h
New Features
- Add
DisclosureProtectionEstimate
metric - Issue #676 by @frances-h - Add
DisclosureProtection
metric - Issue #675 by @frances-h
v0.17.1 - 2024-12-04
Maintenance
- Create Prepare Release workflow - Issue #674 by @amontanez24
- Update codecov and add flag for integration tests - Issue #644 by @pvk-developer
Bugs Fixed
InterRowMSAS
ignores sequences with missing values - Issue #679 by @fealho- Improve error handling for datetime values when
apply_log = True
forInterRowMSAS
- Issue #672 by @fealho - Improve warning handling for non-positive values when
apply_log = True
forInterRowMSAS
- Issue #670 by @fealho StatisticMSAS
raises undesirableFutureWarning
- Issue #665 by @fealhoKSComplement
can be unstable for constant float values - Issue #652 by @fealho
v0.17.0 - 2024-11-14
This release adds a number of Multi-Sequence Aggregate Similarity (MSAS) metrics!
Bugs Fixed
- Relocate timeseries metrics modules - Issue #661 by @fealho
- Fix
SequenceLengthSimilarity
docstrings - Issue #660 by @fealho - When running Quality Report, ContingencySimilarity produces a RuntimeWarning (
The values in the array are unorderable.
) - Issue #656 by @R-Palazzo
New Features
v0.16.0 - 2024-09-25
This release improves the performance of the contingency_similarity
metric. It also factors dtypes into the score of the TableStructure
metric.
Internal
- Try to improve performance of contingency_similarity - Issue #622 by @amontanez24
New Features
- Add dtype comparison in
TableStructure
metric (used in Diagnostic report) - Issue #631 by @R-Palazzo
v0.15.1 - 2024-08-13
Bugs Fixed
- X-axis for the bar plot should be labeled
Value
instead ofCategory
- Issue #620 by @R-Palazzo - LinAlgError when plotting data that is constant - Issue #616 by @R-Palazzo
- Wrong chart title when generating a box plot for just the real data using
get_column_pair_plot()
- Issue #615 by @R-Palazzo
New Features
- Better error message when passing an SDV Metadata object - Issue #610 by @R-Palazzo
- Check that every property score are index-free - Issue #583 by @R-Palazzo
v0.15.0 - 2024-07-15
This release adds support for NumPy 2.0! Additionally, the visualization utilities no longer require both real and synthetic data to be provided, and they can now be used to visualize only real or only synthetic data.
Maintenance
- Switch to using ruff for Python linting and code formatting - Issue #536 by @gsheni
- Change job names in integration workflow to "integration" - Issue #577 by @rwedge
- Cap numpy to less than 2.0.0 until SDMetrics supports - Issue #591 by @gsheni
Internal
New Features
- Allow me to visualize just the real or synthetic data - Issue #581 by @lajohn4747
- Update Referential Integrity metric to support NaNs in child column - Issue #587 by @R-Palazzo
- Add support for numpy 2.0.0 - Issue #593 by @R-Palazzo
Bugs Fixed
- ColumnPairTrends score depends on the data index - Issue #582 by @R-Palazzo
- Datetime columns set to Object pandas dtype breaks LSTMDetection - Issue #584 by @fealho
v0.14.1 - 2024-05-13
This release patches a bug on the LSTMDetection
metric.
Bugs Fixed
LSTMDetection
metric crashes when there are multiple context columns - Issue #298 by @frances-h
Maintenance
- Cleanup automated PR workflows - Issue #566 by @R-Palazzo
- Only run unit and integration tests on oldest and latest python versions for macos - Issue #569 by @R-Palazzo
v0.14.0 - 2024-04-11
This release adds support for Python 3.12! It also improves the way the reports print in verbose mode.
Maintenance
- Support Python 3.12 - Issue #529 by @fealho
- Add dependency checker - Issue #547 by @lajohn4747
- Add bandit workflow - Issue #552 by @R-Palazzo
- Fix minimum version workflow when pointing to github branch - Issue #555 by @R-Palazzo
New Features
- Improve readability of the report scores when verbosity is on - Issue #538 by @lajohn4747
v0.13.1 - 2024-03-14
Maintenance
- Transition from using setup.py to pyroject.toml to specify project metadata - Issue #534 by @lajohn4747
- Remove bumpversion and use bump-my-version - Issue #535 by @R-Palazzo
- Add support for Copulas 0.10 - Issue #541 by @amontanez24
v0.13.0 - 2023-12-04
This release makes significant improvements to the Diagnostic Reports! The report now runs a diagnostic to calculate scores for three basic but important properties of your data: data validity, data structure and in the multi table case, relationship validity. Data validity checks that the columns of your data are valid (eg. correct range or values). Data structure makes sure the synthetic data has the correct columns. Relationship validity checks to make sure key references are correct and the cardinality is within ranges seen in the real data. These changes are meant to make the DiagnosticReport
a quick way for you to see if there are any major problems with your synthetic data.
Additionally, some general improvements were made and bugs were resolved. The LogisticDetection
and SVCDetection
metrics were fixed to only use boolean, categorical, datetime and numeric columns in their calculations. A bug that prevented visualizations from displaying on Jupyter notebooks was patched. The cardinality property in the multi table QualityReport
can now handle multiple foreign keys to the same parent. Finally, a new visualization was added for sequential/timeseries data called get_column_line_plot
.
New Features
- Detection metrics should only use statistically modeled columns (filter out the rest) - Issue #286 by @lajohn4747
- Add visualization for timeseries / sequential data - Issue #376 by @lajohn4747
- Multi table quality report should handle multi-foreign keys (to same parent) - Issue #406 by @R-Palazzo
- Add
KeyUniqueness
metric - Issue #460 by @R-Palazzo - Add
ReferentialIntegrity
metric - Issue #461 by @R-Palazzo - Add
CategoryAdherence
metric - Issue #462 by @R-Palazzo - Add
TableFormat
metric - Issue #463 by @R-Palazzo - Add
CardinalityBoundaryAdherence
metric - Issue #464 by @frances-h - Add
DataValidity
property - Issue #467 by @R-Palazzo - Add
Structure
property - Issue #468 by @R-Palazzo - Add
Relationship Validity
property - Issue #469 by @R-Palazzo - Update
DiagnosticReport
to calculate base correctness of synthetic data - Issue #471 by @R-Palazzo - Update the synthetic data that's available for the multi-table demo - Issue #501 by @R-Palazzo
- Update the synthetic data that's available for the single-table demo - Issue #502 by @R-Palazzo
- Update
TableFormat
metric toTableStructure
+ fix its computation - Issue #518 by @R-Palazzo
Bugs Fixed
- Sometimes graphs don't show when using Jupyter notebook - Issue #322 by @pvk-developer
- Fix ReferentialIntegrity NaN handling - Issue #494 by @R-Palazzo
- KeyUniqueness metric should only be applied to primary and alternate keys - Issue #503 by @R-Palazzo
- Single table Structure property should not have visualization - Issue #504 by @R-Palazzo
- Multi table Structure property visualization has incorrect styling - Issue #505 by @R-Palazzo
UserWarning: KeyError: 'relationships'
in DiagnosticReport if metadata missing relationships - Issue #506 by @R-Palazzo- Report
validate
method should be private - Issue #507 by @R-Palazzo ValueError
in DiagnosticReport if synthetic data does not match metadata - Issue #508 by @R-Palazzo- Check if QualityReport needs the synthetic data to match the metadata - Issue #509 by @R-Palazzo
- Running single table report on multi table data (or vice versa) results in confusing error - Issue #510 by @R-Palazzo
- Add metadata validation - Issue #526 by @R-Palazzo