Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-694457: Fix leaking HTTP_PROXY and HTTPS_PROXY environment variables #1398

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ozturkberkay
Copy link

@ozturkberkay ozturkberkay commented Jan 15, 2023

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-694457: Connector is leaking HTTP(S)_PROXY environment variables #1329

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am modifying authorization mechanisms
    • I am adding new credentials
    • I am modifying OCSP code
    • I am adding a new dependency
  3. Please describe how your code solves the related issue.

    Connector is overwriting the HTTP_PROXY and HTTPS_PROXY environment variables. I think this is a big problem, the client should not overwrite any environment variable, this has caused a bunch of very hard to debug problems for me at work. Since snowflake uses the vendored requests library to send the query requests, why not just pass in the proxies to that instead of changing the environment variables? If there is a strong reason for changing the variables, I think we should do something like this:

@contextlib.contextmanager
def set_proxy_env_vars(proxy: str) -> Generator[Any, Any, None]:
    proxy_env_vars = {"HTTP_PROXY": None, "HTTPS_PROXY": None}
    for env in proxy_env_vars:
        proxy_env_vars[env] = os.environ.get(env)
        os.environ[env] = proxy
    yield
    for env, val in proxy_env_vars.items():
        os.environ[env] = val

This will change the environment variables only for the scope of the contextmanager and revert the variables back to their original values.

@github-actions
Copy link

github-actions bot commented Jan 15, 2023

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@ozturkberkay
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

@ozturkberkay ozturkberkay marked this pull request as ready for review January 27, 2023 20:04
@ozturkberkay ozturkberkay changed the title Fix leaking HTTP_PROXY and HTTPS_PROXY environment variables SNOW-694457: Fix leaking HTTP_PROXY and HTTPS_PROXY environment variables Jan 27, 2023
@ozturkberkay
Copy link
Author

@sfc-gh-sfan I was wondering if this can prioritized?

Copy link
Contributor

@sfc-gh-sfan sfc-gh-sfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to me, but I could also see potential dependency on the env var. @sfc-gh-mkeller: What do you think?

@sfc-gh-dszmolka
Copy link
Contributor

hi @sfc-gh-sfan @sfc-gh-mkeller could we please look into this PR ? apparently it's still valid and would help. thank you !

Copy link
Collaborator

@sfc-gh-mkeller sfc-gh-mkeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to apologize, I dropped the ball on this PR. In exchange I rebased the branch onto main. The code change looks good to me

@sfc-gh-dszmolka do you have access to test this as a last sanity check before merging?

I'd especially like to know that this works with the SOCKS5 change introduced in #1398. So that we'd be sure that my rebase was good!

Comment on lines 25 to 29
auth = (
f"{proxy_user or ''}:{proxy_password or ''}@"
if proxy_user or proxy_password
else ""
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing proxy_user or '' and proxy_password or '' I'd personally prefer to set the default values of the function to those. That'd end up changing this code to:

        auth = (
            f"{proxy_user}:{proxy_password}@"
            if proxy_user or proxy_password
            else ""
        )

I did get caught out by all those ORs, but it doesn't actually change what the code does.
This is more of my nitpick

@sfc-gh-mkeller
Copy link
Collaborator

@sfc-gh-aling could you please take this and get it merged once we can sanity check correctness?
I no longer feel confident merging code into this project, I haven't been operating on it for a while 😓

I did add everything that is new since the PR was opened and I know what they are

@sfc-gh-dszmolka
Copy link
Contributor

sfc-gh-dszmolka commented Aug 21, 2024

@sfc-gh-mkeller @sfc-gh-aling i'm trying to sanity-test this manually , hope this is something you had in mind.
Using the example from original issue submission #1329

test script:

# cat PythonConnector1398.py 
import snowflake.connector
import os

print(f'Starting test. Proxy envvars: HTTP_PROXY={os.environ.get("HTTP_PROXY")}, HTTPS_PROXY={os.environ.get("HTTPS_PROXY")}')

proxies_to_test = ["http://my.pro.xy", "https://s.my.pro.xy", "socks5://localhost"]

for proxy_to_test in proxies_to_test:
    snowflake_config = {
      "account": os.environ.get("SFACCOUNT"),
      "user": os.environ.get("SFUSER"),
      "password": os.environ.get("SFPASS"),
      "proxy_host": proxy_to_test,
      "proxy_port": "8080",
      "no_retry": True,
      }

    print(f'=> Testing with {proxy_to_test}.')
    print(f'Before attempting to connect to Snowflake - HTTP_PROXY={os.environ.get("HTTP_PROXY")}, HTTPS_PROXY={os.environ.get("HTTPS_PROXY")}')
    try:
      # no such proxies so this will fail
      conn = snowflake.connector.connect(**snowflake_config)
      conn.close()
    except:
      print(f"Could not connect to Snowflake.")      

    print(f'After attempting to connect to Snowflake - HTTP_PROXY={os.environ.get("HTTP_PROXY")}, HTTPS_PROXY={os.environ.get("HTTPS_PROXY")}\n')

latest release PythonConnector 3.12.1 which doesn't have this PR

# pip install snowflake-connector-python==3.12.1
# python PythonConnector1398.py 
Starting test. Proxy envvars: HTTP_PROXY=None, HTTPS_PROXY=None
=> Testing with http://my.pro.xy.
Before attempting to connect to Snowflake - HTTP_PROXY=None, HTTPS_PROXY=None
Could not connect to Snowflake.
After attempting to connect to Snowflake - HTTP_PROXY=http://my.pro.xy:8080, HTTPS_PROXY=http://my.pro.xy:8080

=> Testing with https://s.my.pro.xy.
Before attempting to connect to Snowflake - HTTP_PROXY=http://my.pro.xy:8080, HTTPS_PROXY=http://my.pro.xy:8080
Could not connect to Snowflake.
After attempting to connect to Snowflake - HTTP_PROXY=http://s.my.pro.xy:8080, HTTPS_PROXY=http://s.my.pro.xy:8080

=> Testing with socks5://localhost.
Before attempting to connect to Snowflake - HTTP_PROXY=http://s.my.pro.xy:8080, HTTPS_PROXY=http://s.my.pro.xy:8080
Could not connect to Snowflake.
After attempting to connect to Snowflake - HTTP_PROXY=http://socks5://localhost:8080, HTTPS_PROXY=http://socks5://localhost:8080

(also observe how the socks5:// scheme is somehow detected as ... nothing, but part of the proxy hostname? then added to the detected scheme http://)

this PR

# <spun up a brand new container python:3.11-bookworm to avoid messing with env>
# pip install git+https://github.com/snowflakedb/snowflake-connector-python.git@refs/pull/1398/head
root@ce6aa87f78ab:/test# pip install git+https://github.com/snowflakedb/snowflake-connector-python.git@refs/pull/1398/head
Collecting git+https://github.com/snowflakedb/snowflake-connector-python.git@refs/pull/1398/head
..
  Running command git checkout -q d71ab887593f489b2aaf7b8431f06535e60a97d8
  Resolved https://github.com/snowflakedb/snowflake-connector-python.git to commit d71ab887593f489b2aaf7b8431f06535e60a97d8
Successfully built snowflake-connector-python
# python PythonConnector1398.py 
Starting test. Proxy envvars: HTTP_PROXY=None, HTTPS_PROXY=None
=> Testing with http://my.pro.xy.
Before attempting to connect to Snowflake - HTTP_PROXY=None, HTTPS_PROXY=None
Could not connect to Snowflake.
After attempting to connect to Snowflake - HTTP_PROXY=None, HTTPS_PROXY=None

=> Testing with https://s.my.pro.xy.
Before attempting to connect to Snowflake - HTTP_PROXY=None, HTTPS_PROXY=None
Could not connect to Snowflake.
After attempting to connect to Snowflake - HTTP_PROXY=None, HTTPS_PROXY=None

=> Testing with socks5://localhost.
Before attempting to connect to Snowflake - HTTP_PROXY=None, HTTPS_PROXY=None
Could not connect to Snowflake.
After attempting to connect to Snowflake - HTTP_PROXY=None, HTTPS_PROXY=None

Looks good at the first glance. Do you need me to test anything else, or did i miss something perhaps?

I would also like to add that even with current-version 3.12.1, subsequent runs of the test script shows proxies as None, so it doesn't seem to be actually changed OS-wide, but rather inside the process. Which might be still bad enough of course.

edit: adding an existing proxy (listening on http) to the test shows that the connector installed from the PR correctly passes the request to the proxy without changing the envvar value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SNOW-694457: Connector is leaking HTTP(S)_PROXY environment variables
4 participants