Skip to content

Commit

Permalink
The rule mining tab in the monitoring and partition sections are name…
Browse files Browse the repository at this point in the history
…d as "Copy verified profiling checks", and will not copy default (policy) checks.
  • Loading branch information
piotrczarnas committed Sep 5, 2024
1 parent 687c322 commit 629b087
Show file tree
Hide file tree
Showing 13 changed files with 69 additions and 23 deletions.
4 changes: 2 additions & 2 deletions docs/dqo-concepts/data-quality-rule-mining.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,8 +258,8 @@ to [table monitoring](definition-of-data-quality-checks/data-observability-monit
|--------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Copy failed profiling checks | Copy the configuration of profiling checks that failed during the last execution. The preferred approach is to review the profiling checks, disable false-positive checks, and enable this configuration to copy the reviewed checks to the monitoring and partitioned checks for continuous monitoring. |
| Copy disabled profiling checks | Copy the configuration of disabled profiling checks. This option is effective for monitoring or partitioned checks only. By default it is disabled, leaving failed or incorrectly configured profiling checks only in the profiling section to avoid decreasing the [data quality KPI](definition-of-data-quality-kpis.md). |
| Copy enabled profiling checks | Copy the configuration of enabled profiling checks to the monitoring or partitioned checks. This option is effective for monitoring or partitioned checks only. By default it is enabled, allowing to migrate configured profiling checks to the monitoring section to enable Data Observability of these checks. |
| Reconfigure default checks | Reconfigure the rule thresholds of data quality checks that were activated using [data observability](data-observability.md) rule patterns (data quality policies). |
| Copy profiling checks | Copy the configuration of enabled profiling checks to the monitoring or partitioned checks. This option is effective for monitoring or partitioned checks only. By default it is enabled, allowing to migrate configured profiling checks to the monitoring section to enable Data Observability of these checks. |
| Tune quality policy checks | Reconfigure the rule thresholds of data quality checks that were activated using [data observability](data-observability.md) rule patterns (data quality policies). |


### Data quality checks
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ const initTabs = [
value: 'table-comparisons'
},
{
label: 'Rule mining',
label: 'Copy verified profiling checks',
value: 'rule-mining'
}
];
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ const initTabs = [
value: 'table-comparisons'
},
{
label: 'Rule mining',
label: 'Copy verified profiling checks',
value: 'rule-mining'
}
];
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ export default function RuleMining({
copy_failed_profiling_checks: true,
copy_disabled_profiling_checks: false,
copy_profiling_checks: true,
propose_default_checks: true,
reconfigure_policy_enabled_checks: true,
propose_minimum_row_count: true,
propose_column_count: true,
propose_timeliness_checks: true,
Expand Down Expand Up @@ -211,7 +211,8 @@ export default function RuleMining({
...configuration,
category_filter: addPrefix(configuration.category_filter ?? ''),
column_name_filter: addPrefix(configuration.column_name_filter ?? ''),
check_name_filter: addPrefix(configuration.check_name_filter ?? '')
check_name_filter: addPrefix(configuration.check_name_filter ?? ''),
propose_checks_from_statistics: checkTypes === CheckTypes.PROFILING
};
setLoading(true);
switch (checkTypes) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ export default function RuleMiningFilters({
/>
<Checkbox
className="p-2 !w-62"
label="Copy enabled profiling checks"
label="Copy profiling checks"
tooltipText="Copy the configuration of enabled profiling checks to the monitoring or partitioned checks. This option is effective for monitoring or partitioned checks only. By default it is enabled, allowing to migrate configured profiling checks to the monitoring section to enable Data Observability of these checks."
checked={configuration.copy_profiling_checks}
onChange={(e) =>
Expand All @@ -108,11 +108,11 @@ export default function RuleMiningFilters({
/>
<Checkbox
className="p-2 !w-62"
label="Reconfigure default checks"
label="Tune quality policy checks"
tooltipText="Reconfigure the rule thresholds of data quality checks that were activated using data observability rule patterns (data quality policies)."
checked={configuration.propose_default_checks}
checked={configuration.reconfigure_policy_enabled_checks}
onChange={(e) =>
onChangeConfiguration({ propose_default_checks: e })
onChangeConfiguration({ reconfigure_policy_enabled_checks: e })
}
/>
</div>
Expand Down
4 changes: 2 additions & 2 deletions dqops/src/main/frontend/src/shared/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ export const TABLE_LEVEL_TABS: {
value: 'table-comparisons'
},
{
label: 'Rule mining',
label: 'Copy verified profiling checks',
value: 'rule-mining'
}
],
Expand All @@ -203,7 +203,7 @@ export const TABLE_LEVEL_TABS: {
value: 'table-comparisons'
},
{
label: 'Rule mining',
label: 'Copy verified profiling checks',
value: 'rule-mining'
}
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,13 @@ public class CheckMiningParametersModel implements Cloneable {
* Propose the rules for default checks that were activated using data quality check patterns (policies). The default value of this parameter is 'true'.
*/
@JsonPropertyDescription("Propose the rules for default checks that were activated using data quality check patterns (policies). The default value of this parameter is 'true'.")
private boolean proposeDefaultChecks = true;
private boolean reconfigurePolicyEnabledChecks = true;

/**
* Propose the configuration of data quality checks from statistics.
*/
@JsonPropertyDescription("Propose the configuration of data quality checks from statistics.")
private boolean proposeChecksFromStatistics = true;

/**
* Propose the default configuration of the minimum row count for monitoring checks (full table scans). The default value of this parameter is 'true'.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,8 @@ public CheckMiningProposalModel proposeChecks(
}

TableProfilingResults tableProfilingResults = this.tableProfilingResultsReadService.loadTableProfilingResults(
executionContext, connectionSpec, clonedTableSpec);
executionContext, connectionSpec, clonedTableSpec, miningParameters.isProposeChecksFromStatistics(),
checkType == CheckType.profiling);

AbstractRootChecksContainerSpec tableCheckRootContainer = clonedTableSpec.getTableCheckRootContainer(
checkType, checkTimeScale, false, true);
Expand Down Expand Up @@ -229,7 +230,7 @@ targetCheckRootContainer, new CheckSearchFilters(), connectionSpec, tableSpec, e

AbstractCheckSpec<?, ?, ?, ?> checkSpec = checkModel.getCheckSpec();

if (checkModel.isDefaultCheck() && checkSpec.hasAnyRulesEnabled() && !miningParameters.isProposeDefaultChecks()) {
if (checkModel.isDefaultCheck() && checkSpec.hasAnyRulesEnabled() && !miningParameters.isReconfigurePolicyEnabledChecks()) {
listOfChecksInCategory.remove(checkModel);
continue; // skip default checks
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -149,4 +149,18 @@ public void importStatistics(List<StatisticsMetricModel> statistics, ZoneId time
}
}
}

/**
* Removes the results of all profiling checks that were applied by default check patterns (policies), because we don't want to reconfigure them.
*/
public void removeChecksAppliedByPatterns() {
LinkedHashMap<String, ProfilingCheckResult> copyOfProfilingChecks = new LinkedHashMap<>(this.profilingCheckResults);

for (Map.Entry<String, ProfilingCheckResult> profilingCheckKeyValue : copyOfProfilingChecks.entrySet()) {
ProfilingCheckResult profilingCheckResult = profilingCheckKeyValue.getValue();
if (profilingCheckResult.getProfilingCheckModel() != null && profilingCheckResult.getProfilingCheckModel().isDefaultCheck()) {
this.profilingCheckResults.remove(profilingCheckKeyValue.getKey());
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,13 @@ public interface TableProfilingResultsReadService {
* @param executionContext Execution context with access to the user home.
* @param connectionSpec Connection specification.
* @param tableSpec Table specification of the table that is analyzed.
* @param importStatistics Import statistics to be used by the rule miner. Without the statistics, the miner can only configure current checks or copy profiling checks.
* @param importDefaultChecks Imports the results of default checks. When we disable it, the rule miner will not see their results and will not propose configuring them. It is important when configuring the monitoring and partition checks to not copy them.
* @return All loaded results for a table.
*/
TableProfilingResults loadTableProfilingResults(ExecutionContext executionContext,
ConnectionSpec connectionSpec,
TableSpec tableSpec);
TableSpec tableSpec,
boolean importStatistics,
boolean importDefaultChecks);
}
Original file line number Diff line number Diff line change
Expand Up @@ -73,12 +73,16 @@ public TableProfilingResultsReadServiceImpl(
* @param executionContext Execution context with access to the user home.
* @param connectionSpec Connection specification.
* @param tableSpec Table specification of the table that is analyzed.
* @param importStatistics Import statistics to be used by the rule miner. Without the statistics, the miner can only configure current checks or copy profiling checks.
* @param importDefaultChecks Imports the results of default checks. When we disable it, the rule miner will not see their results and will not propose configuring them. It is important when configuring the monitoring and partition checks to not copy them.
* @return All loaded results for a table.
*/
@Override
public TableProfilingResults loadTableProfilingResults(ExecutionContext executionContext,
ConnectionSpec connectionSpec,
TableSpec tableSpec) {
TableSpec tableSpec,
boolean importStatistics,
boolean importDefaultChecks) {
UserHomeContext userHomeContext = executionContext.getUserHomeContext();
UserDomainIdentity userDomainIdentity = userHomeContext.getUserIdentity();
TableProfilingResults tableProfilingResults = this.checkResultsDataService.loadProfilingChecksResultsForTable(
Expand All @@ -96,6 +100,9 @@ public TableProfilingResults loadTableProfilingResults(ExecutionContext executio
tableProfilingResults.setMissingProfilingChecksResults(false);
}
tableAssetProfilingResults.importChecksModels(tableChecksModel);
if (!importDefaultChecks) {
tableAssetProfilingResults.removeChecksAppliedByPatterns();
}

for (ColumnSpec columnSpec : tableSpec.getColumns().values()) {
AbstractRootChecksContainerSpec columnProfilingChecksContainer = columnSpec.getColumnCheckRootContainer(
Expand All @@ -110,14 +117,20 @@ public TableProfilingResults loadTableProfilingResults(ExecutionContext executio
tableProfilingResults.setMissingProfilingChecksResults(false);
}
columnAssetProfilingResultsContainer.importChecksModels(columnChecksModel);
if (!importDefaultChecks) {
columnAssetProfilingResultsContainer.removeChecksAppliedByPatterns();
}
}

StatisticsResultsForTableModel mostRecentStatisticsForTable = this.statisticsDataService.getMostRecentStatisticsForTable(connectionSpec.getConnectionName(),
tableSpec.getPhysicalTableName(), CommonTableNormalizationService.NO_GROUPING_DATA_GROUP_NAME, true, userDomainIdentity);

ZoneId defaultTimeZoneId = this.defaultTimeZoneProvider.getDefaultTimeZoneId(userHomeContext);
tableProfilingResults.setTimeZoneId(defaultTimeZoneId);
tableProfilingResults.importStatistics(mostRecentStatisticsForTable);

if (importStatistics) {
StatisticsResultsForTableModel mostRecentStatisticsForTable = this.statisticsDataService.getMostRecentStatisticsForTable(connectionSpec.getConnectionName(),
tableSpec.getPhysicalTableName(), CommonTableNormalizationService.NO_GROUPING_DATA_GROUP_NAME, true, userDomainIdentity);
tableProfilingResults.importStatistics(mostRecentStatisticsForTable);
}

tableProfilingResults.calculateMissingNotNullCounts();

for (DictionaryWrapper dictionaryWrapper : userHomeContext.getUserHome().getDictionaries()) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19799,10 +19799,14 @@
"type" : "boolean",
"description" : "Copy the configuration of valid profiling checks."
},
"propose_default_checks" : {
"reconfigure_policy_enabled_checks" : {
"type" : "boolean",
"description" : "Propose the rules for default checks that were activated using data quality check patterns (policies). The default value of this parameter is 'true'."
},
"propose_checks_from_statistics" : {
"type" : "boolean",
"description" : "Propose the configuration of data quality checks from statistics."
},
"propose_minimum_row_count" : {
"type" : "boolean",
"description" : "Propose the default configuration of the minimum row count for monitoring checks (full table scans). The default value of this parameter is 'true'."
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16770,11 +16770,14 @@ definitions:
copy_profiling_checks:
type: "boolean"
description: "Copy the configuration of valid profiling checks."
propose_default_checks:
reconfigure_policy_enabled_checks:
type: "boolean"
description: "Propose the rules for default checks that were activated using\
\ data quality check patterns (policies). The default value of this parameter\
\ is 'true'."
propose_checks_from_statistics:
type: "boolean"
description: "Propose the configuration of data quality checks from statistics."
propose_minimum_row_count:
type: "boolean"
description: "Propose the default configuration of the minimum row count for\
Expand Down

0 comments on commit 629b087

Please sign in to comment.