Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/integrate connections toml #298

Merged

Conversation

zanebclark
Copy link

Support configuration via a connections.toml file.

Previously, credential and required argument logic was distributed across different classes and didn't consider a connections.toml file. This pull request combines configuration and credential logic, supporting an easy-to-follow order of preference for different sources. The get_merged_config function pulls from the following sources with descending preference:

  1. environment variables
  2. cli arguments
  3. schemachange-yaml
  4. connections.toml

The check_for_required_args logic is authentication-method specific.

The following cli arguments have been added:

  1. --config-file-name
  2. --snowflake-authenticator
  3. --snowflake-private-key-path
  4. --snowflake-token-path
  5. --connections-file-path
  6. --connection-name

Limitions / Concerns

  • The snowflake cli only supports a config.toml and the Snowflake Python connector only supports a connections.toml. There's a real lack of parity across the tools here.
  • Some of the connections.toml arguments use hyphens and some use underscores.
  • I've been unable to find documentation on all available connections.toml options. Instead, I'll find references to specific options in different tool docs. This PR might not support all options
  • I think the Snowflake Python connector and the Snowflake CLI support connection-specific environment variable overrides. I didn't implement this support in this pull request.
  • The readme could use an update with the new arguments and authentication options.

@zanebclark
Copy link
Author

@sfc-gh-tmathew and @sfc-gh-jhansen, flagging a few 4.0.0 candidates. I'm especially interested in this one, as it both simplifies the implementation and supports connections.toml configuration.

Copy link

@afeld afeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, very impressed at the amount of work here. I'm not a schemachange maintainer, but would benefit from this. Left some suggestions.

Also, I haven't spent a lot of time in the code of the Python Connector, but it seems like schemachange is doing a lot of work around the config and connection that should be handled by the Connector. I wonder if there are ways to lean on it more heavily, allowing a lot of this schemachange code to be cut out.

Thanks, hoping this gets merged sooner or later!

schemachange/config/DeployConfig.py Outdated Show resolved Hide resolved
"snowflake_password": connection.get("password"),
"snowflake_private_key_path": connection.get("private-key"),
"snowflake_token_path": connection.get("token_file_path"),
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a lot of overlap with DeployConfig's get_session_kwargs(), so wondering if it makes sense to centralize the logic. If that doesn't make sense, would at least suggest moving it to a standalone function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can easily combine them. In one case, the keys have the snowflake prefix. In the other case, the class attributes have the snowflake prefix. We could drop the snowflake prefix from the DeployConfig and have a straight mapping 🤔 . I want to say that there was a standard library name overlap or something with that approach. As for making it a standalone function, is this what you're thinking:

def get_connection_kwargs(connection):
    return {
        "snowflake_account": connection.get("account"),
        "snowflake_user": connection.get("user"),
        "snowflake_role": connection.get("role"),
        "snowflake_warehouse": connection.get("warehouse"),
        "snowflake_database": connection.get("database"),
        "snowflake_schema": connection.get("schema"),
        "snowflake_authenticator": connection.get("authenticator"),
        "snowflake_password": connection.get("password"),
        "snowflake_private_key_path": connection.get("private-key"),
        "snowflake_token_path": connection.get("token_file_path"),
    }

What would you call it? get_connection_kwargs is the name of the function that would call this function. I don't have a problem with factoring the mapping out though.

tests/config/test_DeployConfig.py Outdated Show resolved Hide resolved
@zanebclark
Copy link
Author

Also, I haven't spent a lot of time in the code of the Python Connector, but it seems like schemachange is doing a lot of work around the config and connection that should be handled by the Connector. I wonder if there are ways to lean on it more heavily, allowing a lot of this schemachange code to be cut out.

@afeld, that's a great point. I maintained some legacy "check for required arguments" logic, but we could forego this and pass snowflake-related arguments to the Python Connector.

@sfc-gh-twhite
Copy link
Collaborator

Awesome work here!

The snowflake cli only supports a config.toml and the Snowflake Python connector only supports a connections.toml. There's a real lack of parity across the tools here.

I agree. I've heard these teams are working to help introduce more parity between the Python Connector and Snowflake CLI, hopefully it improves soon.

The following cli arguments have been added:

--config-file-name

Should we introduce an additional argument for --config-file-name or lean on the SNOWFLAKE_DEFAULT_CONNECTION_NAME environment variable usage outlined at Setting a default connection? I have no strong preference here, but wanted to share this as an option.

@sfc-gh-tmathew
Copy link
Collaborator

@zanebclark, Can you pull from the current master branch, sync up and then discuss the plan to bake this feature into an upcoming release?

We will need to update the changelog.md to communicate the change in connection properties.
will need to bring consistency in variables being set using the merge_config method.

cc: @sfc-gh-twhite @jeremiahhansen

@zanebclark
Copy link
Author

@sfc-gh-tmathew, sure thing. On the subject of improving contribution documentation, check out #296.

@zanebclark
Copy link
Author

Should we introduce an additional argument for --config-file-name or lean on the SNOWFLAKE_DEFAULT_CONNECTION_NAME environment variable usage outlined at Setting a default connection?

@sfc-gh-twhite , --config-file-name refers to the schemachange config file: schemachange-config.yaml. In truth, I introduced it to make it easier to test. See the various schemachange-config.yml variants in the tests/config folder.

I've added support for the SNOWFLAKE_DEFAULT_CONNECTION_NAME env var. Great suggestion!

@zanebclark
Copy link
Author

@sfc-gh-tmathew , this is good to go. Let me know if I missed something.

@zanebclark
Copy link
Author

zanebclark commented Nov 12, 2024

@sfc-gh-tmathew @sfc-gh-jhansen @sfc-gh-twhite , I took steps to do the following:

  1. Prioritize cli arguments over environment variable arguments
  2. Remove OAuth config support

Some questions around Snowflake's documentation:

  1. Does the python connector support the SNOWFLAKE_PASSWORD environment variable? https://docs.snowflake.com/en/developer-guide/snowflake-cli/connecting/configure-connections A plaintext search of the connector code suggests the answer is no.
  2. Does the python connector support the SNOWSQL_PWD environment variable? Or, is this just included as an example of fetching said environment variable and then passing it to the connector? https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-connect#label-python-key-pair-authn-rotation A plaintext search of the connector code suggests the answer is no.
  3. Can the private key passphrase be stored in the connections.toml file? It isn't shown in this example: https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-connect#connecting-using-the-connections-toml-file
  4. Does the python connector support the PRIVATE_KEY_PASSPHRASE environment variable? Snow CLI seems to: https://docs.snowflake.com/en/developer-guide/snowflake-cli/connecting/configure-connections#use-a-private-key-file-for-authentication A plaintext search of the connector code suggests the answer is no.

@zanebclark
Copy link
Author

While I'm complaining, I'll point out that the connections.toml syntax for the private key is "private-key" and the token file path is "token_file_path". At this rate, I'd fully expect the private key passphrase to require the key of "privateKeyPassphrase"

@sfc-gh-twhite
Copy link
Collaborator

@sfc-gh-tmathew @sfc-gh-jhansen @sfc-gh-twhite , I took steps to do the following:

  1. Prioritize cli arguments over environment variable arguments
  2. Remove OAuth config support

Thank you for working through this!

  1. Can the private key passphrase be stored in the connections.toml file? It isn't shown in this example: https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-connect#connecting-using-the-connections-toml-file

It can with the private_key_file_pwd. Connection definitions support the same configuration options available in the snowflake.connector.connect method.

Parameter Required Description
private_key_file_pwd   Specifies the passphrase to decrypt the private key file for the specified user. See Using key-pair authentication and key-pair rotation.

@zanebclark zanebclark force-pushed the feat/integrate-connections-toml branch from 0f7bc2a to 3e21f3c Compare November 15, 2024 04:21
@zanebclark zanebclark force-pushed the feat/integrate-connections-toml branch from e0285c3 to c07f97c Compare November 15, 2024 05:50
@zanebclark
Copy link
Author

Thanks @sfc-gh-twhite. It turns out I didn't need to know what the connections.toml argument was because I'm not interested in connection arguments anymore.

@zanebclark
Copy link
Author

@sfc-gh-tmathew @sfc-gh-jhansen @sfc-gh-twhite, I think this ready for another review:

  1. I've update the README to reference only 4.0 argument patterns while retaining support for anything that's in the master branch right now. On second thought, this is a lie. I think I removed SNOWFLAKE_PASSWORD support
  2. I created deprecation comments for both yaml and command-line arguments.
  3. I created TODO comments for the eventual removal of support for all connection arguments
  4. I removed new cli ,yaml, and environment variable arguments introduced by this PR.
  5. I removed the check for required arguments, as it was only checking for connection arguments
  6. I sourced SnowflakeSession variables like account and warehouse from the connection so that we don't need to pass them into the constructor.
  7. I updated the integration tests in master-pytest.yml to reference connection.toml arguments instead of yaml or command-line arguments.

How soon do you think we could review this and get it into master? I hate to be pushy, but I'm working with a client that's more comfortable referencing the master branch in the long run.

@sfc-gh-tmathew sfc-gh-tmathew changed the base branch from master to dev November 19, 2024 11:39
@sfc-gh-tmathew sfc-gh-tmathew merged commit e878e17 into Snowflake-Labs:dev Nov 19, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants