source-redshift-batch: Add the 'use_schema_inference' feature flag #2187

willdonnelly · 2024-12-06T18:36:10Z

Description:

This PR introduces the /advanced/feature_flags config setting to the Redshift batch capture, adds a helper function in source-boilerplate for parsing those flags with default values and support for no_ prefixes, and then uses that feature flags infrastructure to add a new one named use_schema_inference which when set causes discovered collection schemas to have x-infer-schema: true set so that they use the inferred schema in conjunction with the discovered schema.

This is useful because Redshift is a data warehouse where users often use very loose declared types on the tables, and then want to be able to rely on more precise as-used-in-practice type guarantees when materializing their dataset. For example, it's tolerably common to have no declared primary keys and declare all columns with a potentially-nullable type, then manually specify a collection key made of one or more of those properties, and then want to materialize that collection to a SQL database. And currently that doesn't work because the columns designated as the key are potentially nullable in the source DB, even if there are no actual nulls in practice. Turning on schema inference is the escape hatch which allows this to work right up until the moment they actually stick a null into their source data.

Workflow steps:

Put use_schema_inference in the "Feature Flags" section of the advanced endpoint config. In the future if more feature flags than just this one exist, they should be comma-separated.

Note that once enabled on a particular capture, simply removing the feature flag will not turn schema inference back on -- as I understand things the use of schema inference is "sticky" and disabling it would require manually fiddling with the collection schemas.

This change is

It looks like the Redshift snapshots were a bit stale even before I made any changes, so this commit gets those updates out of the way.

Implements a new feature flag for `source-redshift-batch` which causes collection schemas to specify `x-infer-schema: true` and thereby request schema inference be used in addition to the full discovered schema from the connector. This can be useful in cases where the declared types of the table in the source DB permit things like nullability, but the dataset doesn't actually use that and the user needs tighter constraints on possible values for materialization purposes (in this example, knowing that the property is never null in practice).

Just a simple test case to make sure the `use_schema_inference` feature flag does what it was intended to.

Alex-Bair

LGTM!

willdonnelly added 4 commits December 6, 2024 10:50

source-redshift-batch: Snapshot updates

6a70c98

It looks like the Redshift snapshots were a bit stale even before I made any changes, so this commit gets those updates out of the way.

source-boilerplate: ParseFeatureFlags helper function

24cdc44

source-redshift-batch: Add TestFeatureFlagUseSchemaInference

0f03a20

Just a simple test case to make sure the `use_schema_inference` feature flag does what it was intended to.

willdonnelly requested a review from a team December 6, 2024 18:36

Alex-Bair approved these changes Dec 6, 2024

View reviewed changes

willdonnelly merged commit 82315d3 into main Dec 6, 2024
52 of 53 checks passed

willdonnelly deleted the wgd/redshift-schema-inference-flag-20241206 branch December 6, 2024 20:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

source-redshift-batch: Add the 'use_schema_inference' feature flag #2187

source-redshift-batch: Add the 'use_schema_inference' feature flag #2187

willdonnelly commented Dec 6, 2024 •

edited by jgraettinger

Loading

Alex-Bair left a comment

source-redshift-batch: Add the 'use_schema_inference' feature flag #2187

source-redshift-batch: Add the 'use_schema_inference' feature flag #2187

Conversation

willdonnelly commented Dec 6, 2024 • edited by jgraettinger Loading

Alex-Bair left a comment

Choose a reason for hiding this comment

willdonnelly commented Dec 6, 2024 •

edited by jgraettinger

Loading