source-redshift-batch: Add the 'use_schema_inference' feature flag #2187
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
This PR introduces the
/advanced/feature_flags
config setting to the Redshift batch capture, adds a helper function insource-boilerplate
for parsing those flags with default values and support forno_
prefixes, and then uses that feature flags infrastructure to add a new one nameduse_schema_inference
which when set causes discovered collection schemas to havex-infer-schema: true
set so that they use the inferred schema in conjunction with the discovered schema.This is useful because Redshift is a data warehouse where users often use very loose declared types on the tables, and then want to be able to rely on more precise as-used-in-practice type guarantees when materializing their dataset. For example, it's tolerably common to have no declared primary keys and declare all columns with a potentially-nullable type, then manually specify a collection key made of one or more of those properties, and then want to materialize that collection to a SQL database. And currently that doesn't work because the columns designated as the key are potentially nullable in the source DB, even if there are no actual nulls in practice. Turning on schema inference is the escape hatch which allows this to work right up until the moment they actually stick a null into their source data.
Workflow steps:
Put
use_schema_inference
in the "Feature Flags" section of the advanced endpoint config. In the future if more feature flags than just this one exist, they should be comma-separated.Note that once enabled on a particular capture, simply removing the feature flag will not turn schema inference back on -- as I understand things the use of schema inference is "sticky" and disabling it would require manually fiddling with the collection schemas.
This change is