-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Webhook variants #1280
Closed
Closed
Added Webhook variants #1280
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Bumps [h2](https://github.com/hyperium/h2) from 0.3.22 to 0.3.24. - [Release notes](https://github.com/hyperium/h2/releases) - [Changelog](https://github.com/hyperium/h2/blob/v0.3.24/CHANGELOG.md) - [Commits](hyperium/h2@v0.3.22...v0.3.24) --- updated-dependencies: - dependency-name: h2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [github.com/go-jose/go-jose/v3](https://github.com/go-jose/go-jose) from 3.0.0 to 3.0.1. - [Release notes](https://github.com/go-jose/go-jose/releases) - [Changelog](https://github.com/go-jose/go-jose/blob/v3/CHANGELOG.md) - [Commits](go-jose/go-jose@v3.0.0...v3.0.1) --- updated-dependencies: - dependency-name: github.com/go-jose/go-jose/v3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
…rialized fields Many systems will preserve capitalization in the names of materialized fields (for example, columns of a SQL table), but still will not allow columns with names that differ only in capitalization to co-exist within the same table. To prevent errors during `Apply`, an additional configuration is provided for `Validate` to consider field names to be case insensitive for the purposes of determining ambiguous columns. This is relevant for materializations to BigQuery, Databricks, MotherDuck, MySQL, and SQLServer. For SQL materializations, the configuration is made available through `materialize-sql` as a new configuration of the `sql.Dialect`. I also moved the `MaxColumnCharLength` (name adjusted slightly) out of the `sql.Endpoint` configuration and into the `sql.Dialect`, since it seemed to thematically fit better there alongside the case-sensitivity configuration.
There was a systemic logic error in how we were escaping quote characters within a quoted identifier. This fixes a number of those. Databricks and MySQL will be fixed in later commits which also change how they are sanitizing identifiers.
MySQL does not allow identifiers to have trailing spaces or characters with code points greater than U+FFFF. (ref https://dev.mysql.com/doc/refman/8.0/en/identifiers.html). These disallowed characters will be sanitized with an underscore.
Databricks has a set of characters that are strictly disallowed for column names, regardless of how their identifiers are quoted. These characters will be sanitized to underscores. Databricks also has a different, not entirely overlapping set of disallowed characters for table names. These will be sanitized as part of the resource path response, rather than returning an error from `Validate`. This will allow table names with spaces, which may be a common occurrence for Databricks materializations linked to source captures that will automatically add new bindings without user involvement in what the configured name for the tables are.
…e-defined "Metadata" fields ElasticSearch has a number of pre-defined "Metadata" fields (see https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-fields.html) that exist for every index, and trying to create an index with a separate mapping with any of these names will fail with a "field is defined more than once" sort of error. All we can really do is differentiate our Flow field by appending "_flow" to the name of the field to allow the field to be created. Doing this kind of sanitization is tricky because there are numerous places it has to be included for things to work out end-to-end without a higher-level framework like `materialize-sql`. And such a general framework may actually make sense to construct for materializations in general, but I'm not tackling that just yet.
Adds a set of tests that ensure destination systems can create materialized fields for Flow field names with some pretty challenging characters. The `materialize-motherduck` tests were enabled and ran as part of this, since they now work. Some additional handling in the test helpers was added to support delta updates configurations, since `materialize-motherduck` must currently always be set to delta updates.
Previously we would only normalize characters in index names that would be allowed by Flow collection names, but that's not fully sufficient since the configured index name in the resource configuration can be anything. It will usually be the collection name, but not always. This extends normalization to all characters that pose problems in index names.
Previously we just had duration strings to specify polling intervals for batch captures. This commit factors out that duration handling into a separate 'schedule' helper package and defines another type of schedule: `"daily at HH:MMZ"` which consistently executes at the requested time of day (in UTC). This refactoring will be applied to other batch SQL captures in a followup commit.
Adds another sentence to the description of the polling schedule properties of the endpoint config and resource spec describing the two accepted forms of schedule, and adds a (somewhat hairy but I tested it manually and I'm pretty sure it's correct for most reasonable inputs while also not using any forbidden Go struct-tag characters) regexp pattern so the UI can validate correct inputs.
Log some basic information about the database that is being connected to, including its version and maximum object size setting, if we are able to.
Updates a test snapshot that was missed in 130ae00
Adds a `paths` parameter to the endpoint config, which accepts an array of strings. Each path provided will result in a separate discovered collection. The paths will also be used to determine the `path` and `recommended_name` of each discovered binding. This allows source-http-ingest to be used with multiple collections and still use auto-discovery. The details of the implementation were largely motivated by a desire to preserve backward compatibility with existing captures. Existing capture bindings generally don't use the `path` parameter, and they end up using the flow collection name as the URL path. So we try to preserve that behavior in the case that no `paths` are provided in the endpoint config, with the goal of not breaking any existing webhook deliveries.
Previously, we were ignoring query parameters altogether. Now we include them in captured documents under `/_meta/query`.
Axum and tower-http had a few minor breaking changes. Tower-http [improved error handling of RequestDecompressionLayer](tower-rs/tower-http#368), which allowed removal of some annoying error conversion code.
Bumps [pillow](https://github.com/python-pillow/Pillow) from 10.1.0 to 10.2.0. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - [Commits](python-pillow/Pillow@10.1.0...10.2.0) --- updated-dependencies: - dependency-name: pillow dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
The changes materializations to provide more actionable messages when producing unsatisfiable constraints. Particularly for SQL materializations, which will currently report the existing column type and proposed JSON schema of the field, the message will now also report the type of column that is needed to accommodate the proposed JSON schema. Overall the objective is to make it easier to understand how a user could migrate their existing table to be compatible with a changed collection schema.
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](pallets/jinja@3.1.2...3.1.3) --- updated-dependencies: - dependency-name: jinja2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](pallets/jinja@3.1.2...3.1.3) --- updated-dependencies: - dependency-name: jinja2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.4 to 41.0.6. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](pyca/cryptography@41.0.4...41.0.6) --- updated-dependencies: - dependency-name: cryptography dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
…lo-world Adds labels to the docker images in order to make some connectors free to run on Estuary's platform. Goes along with estuary/flow#1403. Free connectors are: - source-google-sheets-native - materialize-google-sheets - source-hello-world
Table creation will sometimes fail for a variety of reasons and it is hard to tell which table it is for, or what SQL was attempted to create the table. This adds logging to allow for troubleshooting, and a slightly improved error message.
This may improve throughput when scanning a single large table with large documents.
Some resources always have _some_ level of activity every time they're checked, which would ordinarily prevent the resource from ever being considered "caught up", preventing the connector from exiting. If the LogCursor is a date-time, then additionally consider the task to be idle if the LogCursor is less-than `interval` old (where interval is configured on the ResourceConfig), and sleep until the cursor is `interval` old before issuing a successive fetch.
Much like FetchChangesFn, FetchPageFn is now an AsyncGenerator which yields documents, checkpoint-able PageCursors, or completes an iteration of a resource. This refactoring is consistent with recent updates to FetchChangesFn, and is motivated by use cases where the fetched "page" could be quite large and itself composed of multiple concurrent data fetches, implemented as constituent AsyncGenerator instances. By using an AsyncGenerator, such an implementation can immediately yield from across those concurrent fetches as data arrives such that data is spilled to disk and memory pressure kept low. Then, when the scatter / gather data fetch completes, the entire "page" is checkpointed.
MySQL's handling of enum values in the context of table collation is most vexing in a way that makes our backfill logic misbehave. See, if you write `ORDER BY category` then MySQL orders the rows based on the integer index of the enum values. But if you write `WHERE category > 'CATEGORY_FOO'` then MySQL goes and casts the enum values to strings and compares those. There is probably a way to write some expression like `CAST('CATEGORY_FOO', ...)` which would produce the desired enum ordering comparison in that `WHERE` clause, but the `...` portion of that query might get pretty hairy. We could instead go `ORDER BY CAST(category AS CHAR)` but this would probably force a table sort and we'd prefer to use the natural table ordering where possible. The best solution I have been able to come up with to this problem is to use the integer representation of the enum value as the key for backfill purposes. That means that when serializing a backfill row value into a scan key we convert it to the integer index which corresponds to that string, and so when issuing the next backfill query we effectively go like `WHERE category > 2` (except as a parameterized query).
The string `""` is always accepted by a MySQL enum column and any illegal inputs are mapped to this string, which is represented by the integer 0. We already supported all of this (modulo a small bug in `ALTER TABLE` parsing which I accidentally uncovered as part of this commit), but the logic becomes a bit simpler after this refactor, which moves the `""` option to be first in the list so that its index 0 corresponds more directly to MySQL representing it as the integer zero.
active sessions prevent the warehouse from shutting down using the auto-delay configuration
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
(Describe the high level scope of new or changed features)
Workflow steps:
(How does one use this feature, and how has it changed)
Documentation links affected:
(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)
Notes for reviewers:
(anything that might help someone review this PR)
This change is