Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1764129: Result set iteration fails with 403 error 4 hours after query completes #1941

Closed
ehclark opened this issue Oct 25, 2024 · 7 comments
Assignees
Labels
invalid status-triage_done Initial triage done, will be further handled by the driver team

Comments

@ehclark
Copy link

ehclark commented Oct 25, 2024

Please answer these questions before submitting your issue.
In order to accurately debug the issue this information is required. Thanks!

  1. What version of JDBC driver are you using? 3.14.4
  2. What operating system and processor architecture are you using? Linux, amd64
  3. What version of Java are you using? 11
  4. What did you do? We have a Java program that extracts data from Snowflake. It runs a query then downloads, transforms and outputs result to local files. After running successfully for about 4 hours after query completion, the process failed with the following errors:
Oct 24, 2024 1:17:03 AM net.snowflake.client.jdbc.RestRequest execute
SEVERE: Error response: HTTP Response code: 403, request: GET https://host.blob.core.windows.net/results/01b7e306-0d07-ac45-0001-f61e257298ba_0%2Fmain%2Fdata_0_3_214?sv=2021-04-10&spr=https&se=2024-10-24T01%3A16%3A51Z&sr=b&sp=r&sig=****&rsce=gzip HTTP/1.1
Oct 24, 2024 1:17:03 AM net.snowflake.client.jdbc.DefaultResultStreamProvider getInputStream
SEVERE: Error fetching chunk from: https://host.blob.core.windows.net/results/01b7e306-0d07-ac45-0001-f61e257298ba_0%2Fmain%2Fdata_0_3_214?sv=2021-04-10&spr=https&se=2024-10-24T01%3A16%3A51Z&sr=b&sp=r&sig=****&rsce=gzip
Oct 24, 2024 1:17:03 AM net.snowflake.client.jdbc.SnowflakeUtil logResponseDetails
SEVERE: Response status line reason: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
Oct 24, 2024 1:17:03 AM net.snowflake.client.jdbc.SnowflakeUtil logResponseDetails
SEVERE: Response content: <?xml version="1.0" encoding="utf-8"?><Error><Code>AuthenticationFailed</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:7abdcf6d-501e-00b5-09b2-25ccec000000
Time:2024-10-24T01:17:03.3836985Z</Message><AuthenticationErrorDetail>Signed expiry time [Thu, 24 Oct 2024 01:16:51 GMT] must be after signed start time [Thu, 24 Oct 2024 01:17:03 GMT]</AuthenticationErrorDetail></Error>

The signed expiry time mentioned in the error log is exactly 4 hours after the query completed. It would appear that the authentication tokens used to download the chunks from object storage are generated when the query completes and valid for up to 4 hours.

  1. What did you expect to see? The program should run to completion. The result set for a query should be available to iterate as long as the session is still open and valid.

  2. Can you set logging to DEBUG and collect the logs?

    https://community.snowflake.com/s/article/How-to-generate-log-file-on-Snowflake-connectors

    Before sharing any information, please be sure to review the log and remove any sensitive
    information.

@ehclark ehclark added the bug label Oct 25, 2024
@github-actions github-actions bot changed the title Result set iteration fails with 403 error 4 hours after query completes SNOW-1764129: Result set iteration fails with 403 error 4 hours after query completes Oct 25, 2024
@sfc-gh-wfateem
Copy link
Collaborator

@ehclark I don't believe the issue you're running into is related to the JDBC driver. We'll need sensitive information from you to debug this, so the best avenue here would be for you to open a ticket with Snowflake Support.

To explain very briefly, there's a limit imposed by the backend for the lifetime of the results for a query. If the client takes a longer time to process the results then you'll hit a 403 error, and there's nothing the JDBC driver could do about it. By default, that expiry time is 6 hours, and it's 4 hours in case you're using the Spark connector.

When you open a support ticket you'll need to provide the JDBC debugging logs and account information.

@sfc-gh-wfateem sfc-gh-wfateem self-assigned this Oct 29, 2024
@sfc-gh-wfateem sfc-gh-wfateem added invalid status-triage_done Initial triage done, will be further handled by the driver team and removed bug labels Oct 29, 2024
@ehclark
Copy link
Author

ehclark commented Oct 29, 2024

@sfc-gh-wfateem According to the Snowflake documentation on RESULT_SCAN() function, results are available for 24 hours after query completion. I think the problem is that in the JDBC driver, a manifest of result chunks for the query is downloaded. That manifest includes pre-signed URLs that apparently have a TTL of 4 hours. So it seems that the JDBC driver should detect when those pre-signed URLs are stale, but the results of the query are still available and fetch a new manifest.

@sfc-gh-wfateem
Copy link
Collaborator

@ehclark Those are slightly two different things. When you run a query, you can possibly hit the cache for the next 24 hours. This means that Snowflake won't have to run your query again and will simply point you back to the cached results. However, you can't download any results, whether that's cached or new data, until pre-signed URLs are generated. The pre-signed URL is going to be valid for a finite amount of time from the point when the query finishes running, i.e., the time the URLs are generated.

I can look into trying to reproduce this by attempting to parse a large amount of data, hoping that I go beyond 4 hours, but that's going to take some time.

@sfc-gh-wfateem sfc-gh-wfateem added status-triage Issue is under initial triage and removed invalid status-triage_done Initial triage done, will be further handled by the driver team labels Oct 30, 2024
@sfc-gh-wfateem
Copy link
Collaborator

@ehclark I had a job run overnight and it behaved exactly how I would expect it to.

Here I received the response to my query indicating that it finished running around that time, so we can also expect pre-signed URLs to be generated around the same time:

24/10/30 21:47:10 main DEBUG RestRequest: [requestId=5c8990a7-45e3-4f86-af72-9c3f921d8d23] HTTP response code for request POST https://my-account.snowflakecomputing.com:443/queries/v1/query-request?requestId=5c8990a7-45e3-4f86-af72-9c3f921d8d23 HTTP/1.1: 200

Here are the times it started downloading the first chunk and then the last chunk before the failure:

24/10/30 21:47:13 result-chunk-downloader-1 DEBUG SnowflakeChunkDownloader: Thread 32 start downloading chunk#0

24/10/31 03:46:45 result-chunk-downloader-1 DEBUG SnowflakeChunkDownloader: Thread 32 finish downloading chunk#1085

It went on trying to download chunk# 1086:

24/10/31 03:47:17 result-chunk-downloader-1 DEBUG SnowflakeChunkDownloader: Downloading chunk#1086, url: https://host.s3.amazonaws.com/f6a1-s-v2sc2256/results/01b80c86-0611-f907-006e-32830062f0fa_0/main/data_6_3_18?xxx, Thread 32

That failed:

24/10/31 03:47:18 result-chunk-downloader-1 ERROR DefaultResultStreamProvider: Error fetching chunk from: https://host.s3.amazonaws.com/f6a1-s-v2sc2256/results/01b80c86-0611-f907-006e-32830062f0fa_0/main/data_6_3_18?xxx

The reason for the failure:

24/10/31 03:47:18 result-chunk-downloader-1 ERROR DefaultResultStreamProvider: Response content: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Request has expired</Message><X-Amz-Expires>21599</X-Amz-Expires><Expires>2024-10-31T08:47:10Z</Expires><ServerTime>2024-10-31T08:47:18Z</ServerTime><RequestId>xxx</RequestId><HostId>yyy</HostId></Error>

The timestamp provided by S3 is in GMT, the timestamps shown in my logs are in US CDT (-5 UTC). So you can see that this failure occurred at the six-hour mark, which is the current configuration for my particular account dictating how long pre-signed URLs are valid for.

Your issue looks pretty different, and without having the opportunity to get more details from you including the debugging logs, then it's very hard to say what's happening in your scenario.

Just FYI, I used v3.19.1 in my test, so perhaps that's the only other thing you can test to see if it helps. Otherwise, I recommend you open a support ticket with Snowflake so we can figure this out for you.

@ehclark
Copy link
Author

ehclark commented Oct 31, 2024

@sfc-gh-wfateem Thanks for the follow up. I think the error you are seeing is the same as what I am seeing, the differences have to do with Azure vs AWS. You referred to:

current configuration for my particular account dictating how long pre-signed URLs are valid for

Is that a Snowflake parameter? I did not find a parameter in the documentation that would appear to control that, so maybe it is a private parameter?

@sfc-gh-wfateem
Copy link
Collaborator

@ehclark ah yes! I overlooked the fact that you're connecting to an Azure blob storage server.

Yes, you wouldn't have control over that parameter. I'm just unsure if this is the same issue or not because the default is 6 hours. Users typically ask to increase that, I haven't seen a scenario where someone asked to reduce that time. Either way, we can definitely help you with this if you can open a support ticket with us.

@sfc-gh-wfateem sfc-gh-wfateem added invalid status-triage_done Initial triage done, will be further handled by the driver team and removed status-triage Issue is under initial triage labels Oct 31, 2024
@sfc-gh-wfateem
Copy link
Collaborator

@ehclark If you don't have any other questions, I'll go ahead and close this issue for now given that this looks more like an issue with the environment rather than the JDBC driver. As I mentioned earlier, please do reach out to Snowflake support to help you look into this. If you find new information that's relevant to this issue and you suspect that there's something in the JDBC driver that needs to be looked at, then please feel free to open the issue again.

@sfc-gh-wfateem sfc-gh-wfateem closed this as not planned Won't fix, can't repro, duplicate, stale Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

2 participants