Skip to content

v0.13.0 - 2023-12-04

Compare
Choose a tag to compare
@amontanez24 amontanez24 released this 04 Dec 19:06
· 131 commits to main since this release

This release makes significant improvements to the Diagnostic Reports! The report now runs a diagnostic to calculate scores for three basic but important properties of your data: data validity, data structure and in the multi table case, relationship validity. Data validity checks that the columns of your data are valid (eg. correct range or values). Data structure makes sure the synthetic data has the correct columns. Relationship validity checks to make sure key references are correct and the cardinality is within ranges seen in the real data. These changes are meant to make the DiagnosticReport a quick way for you to see if there are any major problems with your synthetic data.

Additionally, some general improvements were made and bugs were resolved. The LogisticDetection and SVCDetection metrics were fixed to only use boolean, categorical, datetime and numeric columns in their calculations. A bug that prevented visualizations from displaying on Jupyter notebooks was patched. The cardinality property in the multi table QualityReport can now handle multiple foreign keys to the same parent. Finally, a new visualization was added for sequential/timeseries data called get_column_line_plot.

New Features

  • Detection metrics should only use statistically modeled columns (filter out the rest) - Issue #286 by @lajohn4747
  • Add visualization for timeseries / sequential data - Issue #376 by @lajohn4747
  • Multi table quality report should handle multi-foreign keys (to same parent) - Issue #406 by @R-Palazzo
  • Add KeyUniqueness metric - Issue #460 by @R-Palazzo
  • Add ReferentialIntegrity metric - Issue #461 by @R-Palazzo
  • Add CategoryAdherence metric - Issue #462 by @R-Palazzo
  • Add TableFormat metric - Issue #463 by @R-Palazzo
  • Add CardinalityBoundaryAdherence metric - Issue #464 by @frances-h
  • Add DataValidity property - Issue #467 by @R-Palazzo
  • Add Structure property - Issue #468 by @R-Palazzo
  • Add Relationship Validity property - Issue #469 by @R-Palazzo
  • Update DiagnosticReport to calculate base correctness of synthetic data - Issue #471 by @R-Palazzo
  • Update the synthetic data that's available for the multi-table demo - Issue #501 by @R-Palazzo
  • Update the synthetic data that's available for the single-table demo - Issue #502 by @R-Palazzo
  • Update TableFormat metric to TableStructure + fix its computation - Issue #518 by @R-Palazzo

Bugs Fixed

  • Sometimes graphs don't show when using Jupyter notebook - Issue #322 by @pvk-developer
  • Fix ReferentialIntegrity NaN handling - Issue #494 by @R-Palazzo
  • KeyUniqueness metric should only be applied to primary and alternate keys - Issue #503 by @R-Palazzo
  • Single table Structure property should not have visualization - Issue #504 by @R-Palazzo
  • Multi table Structure property visualization has incorrect styling - Issue #505 by @R-Palazzo
  • UserWarning: KeyError: 'relationships' in DiagnosticReport if metadata missing relationships - Issue #506 by @R-Palazzo
  • Report validate method should be private - Issue #507 by @R-Palazzo
  • ValueError in DiagnosticReport if synthetic data does not match metadata - Issue #508 by @R-Palazzo
  • Check if QualityReport needs the synthetic data to match the metadata - Issue #509 by @R-Palazzo
  • Running single table report on multi table data (or vice versa) results in confusing error - Issue #510 by @R-Palazzo
  • Add metadata validation - Issue #526 by @R-Palazzo