Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blob/gcsblob: Ensure driver sets Content-Type auto-detection properly #3381

Merged

Conversation

stanhu
Copy link
Contributor

@stanhu stanhu commented Feb 23, 2024

#3371 made it possible to disable Go Cloud's Content-Type auto-detection via the
DisableContentTypeDetection option. However, the Google Cloud Storage (GCS) driver still performed auto-detection even if this option were enabled, so it was previously not possible to keep Content-Type blank. This commit adds the
DisableContentTypeDetection option to the the driver and passes along the value to make it possible to keep Content-Type blank in the GCS object metadata. This enables the use of the Response-Content-Type override in a signed URL.

This commit needs cloud.google.com/go/storage v1.39 (googleapis/google-cloud-go#9431).

@stanhu stanhu force-pushed the sh-gcsblob-disable-content-type-detection branch 3 times, most recently from fff9db4 to 53a0705 Compare February 23, 2024 03:26
@stanhu
Copy link
Contributor Author

stanhu commented Feb 23, 2024

@vangent I think the GCS tests need to be re-recorded to handle the cloud.google.com/go/storage version bump first, and then perhaps again to handle the latest changes. If I substitute my bucket, then I see different recorded replays. Is there a way for the average person to this? UPDATE: I see that CONTRIBUTING.md already mentions this, so I have to let someone else do the upgrade and regenerate the replay files.

I'm not sure why the Ubuntu test is failing with this, but macOS seems to want github.com/mitchellh/go-homedir:

Ensuring that there are no dependencies not listed in ./internal/testing/alldeps...
--- ./internal/testing/alldeps	2024-02-23 03:26:16.879296954 +0000
+++ -	2024-02-23 03:41:52.978626415 +0000
@@ -100,7 +100,6 @@
 github.com/klauspost/compress
 github.com/kylelemons/godebug
 github.com/lib/pq
-github.com/mitchellh/go-homedir
 github.com/mitchellh/mapstructure
 github.com/montanaflynn/stats
 github.com/nats-io/nats.go
FAIL: dependencies changed; run: internal/testing/listdeps.sh > internal/testing/alldeps

I've regenerated the list on Linux and checked that in.

@stanhu
Copy link
Contributor Author

stanhu commented Feb 23, 2024

Here's a diff between ContentType_is_left_empty_if_not_provided_and_DisableContentTypeDetection_is_true.replay:

 % diff new/ContentType_is_left_empty_if_not_provided_and_DisableContentTypeDetection_is_true.replay newpatch/ContentType_is_left_empty_if_not_provided_and_DisableContentTypeDetection_is_true.replay
2c2
<   "Initial": "AQAAAA7dahh3OKHswP4g",
---
>   "Initial": "AQAAAA7dahj1FFqpsP4g",
40c40
<       "ID": "cbec847fefac6811",
---
>       "ID": "d807be34f7fd3105",
71c71
<             "695"
---
>             "654"
77c77
<             "Fri, 23 Feb 2024 04:39:20 GMT"
---
>             "Fri, 23 Feb 2024 04:41:25 GMT"
80c80
<             "CNHm0aXSwIQDEAE="
---
>             "CLaBtuHSwIQDEAE="
96c96
<             "ABPtcPrmZkac9Ivz6JM0oa_MRPdert1V6QDwMmYv14dyiVgjakwetkodrvUPogTCrU1XKdPApw"
---
>             "ABPtcPpfV9FCLjT5Tf88GqdHuPQKdMKrAALRoYnHeBB7zxAMvPhaPNaqr4UUfOQNgtIY6KHDAeCSSFOBPQ"
99c99
<         "Body": "eyJraW5kIjoic3RvcmFnZSNvYmplY3QiLCJpZCI6InN0YW5odS10ZXN0L2Jsb2ItZm9yLXJlYWRpbmcvMTcwODY2MzE2MDIwNjE2MSIsInNlbGZMaW5rIjoiaHR0cHM6Ly93d3cuZ29vZ2xlYXBpcy5jb20vc3RvcmFnZS92MS9iL3N0YW5odS10ZXN0L28vYmxvYi1mb3ItcmVhZGluZyIsIm1lZGlhTGluayI6Imh0dHBzOi8vc3RvcmFnZS5nb29nbGVhcGlzLmNvbS9kb3dubG9hZC9zdG9yYWdlL3YxL2Ivc3Rhbmh1LXRlc3Qvby9ibG9iLWZvci1yZWFkaW5nP2dlbmVyYXRpb249MTcwODY2MzE2MDIwNjE2MSZhbHQ9bWVkaWEiLCJuYW1lIjoiYmxvYi1mb3ItcmVhZGluZyIsImJ1Y2tldCI6InN0YW5odS10ZXN0IiwiZ2VuZXJhdGlvbiI6IjE3MDg2NjMxNjAyMDYxNjEiLCJtZXRhZ2VuZXJhdGlvbiI6IjEiLCJjb250ZW50VHlwZSI6InRleHQvaHRtbDsgY2hhcnNldD11dGYtOCIsInN0b3JhZ2VDbGFzcyI6IlNUQU5EQVJEIiwic2l6ZSI6IjYzNTQiLCJtZDVIYXNoIjoiRDJwOG4vUFQxM0M0OS9vWndxNFowQT09IiwiY3JjMzJjIjoidGR3cmdRPT0iLCJldGFnIjoiQ05IbTBhWFN3SVFERUFFPSIsInRpbWVDcmVhdGVkIjoiMjAyNC0wMi0yM1QwNDozOToyMC4yNTZaIiwidXBkYXRlZCI6IjIwMjQtMDItMjNUMDQ6Mzk6MjAuMjU2WiIsInRpbWVTdG9yYWdlQ2xhc3NVcGRhdGVkIjoiMjAyNC0wMi0yM1QwNDozOToyMC4yNTZaIn0="
---
>         "Body": "eyJraW5kIjoic3RvcmFnZSNvYmplY3QiLCJpZCI6InN0YW5odS10ZXN0L2Jsb2ItZm9yLXJlYWRpbmcvMTcwODY2MzI4NTU3OTk1OCIsInNlbGZMaW5rIjoiaHR0cHM6Ly93d3cuZ29vZ2xlYXBpcy5jb20vc3RvcmFnZS92MS9iL3N0YW5odS10ZXN0L28vYmxvYi1mb3ItcmVhZGluZyIsIm1lZGlhTGluayI6Imh0dHBzOi8vc3RvcmFnZS5nb29nbGVhcGlzLmNvbS9kb3dubG9hZC9zdG9yYWdlL3YxL2Ivc3Rhbmh1LXRlc3Qvby9ibG9iLWZvci1yZWFkaW5nP2dlbmVyYXRpb249MTcwODY2MzI4NTU3OTk1OCZhbHQ9bWVkaWEiLCJuYW1lIjoiYmxvYi1mb3ItcmVhZGluZyIsImJ1Y2tldCI6InN0YW5odS10ZXN0IiwiZ2VuZXJhdGlvbiI6IjE3MDg2NjMyODU1Nzk5NTgiLCJtZXRhZ2VuZXJhdGlvbiI6IjEiLCJzdG9yYWdlQ2xhc3MiOiJTVEFOREFSRCIsInNpemUiOiI2MzU0IiwibWQ1SGFzaCI6IkQycDhuL1BUMTNDNDkvb1p3cTRaMEE9PSIsImNyYzMyYyI6InRkd3JnUT09IiwiZXRhZyI6IkNMYUJ0dUhTd0lRREVBRT0iLCJ0aW1lQ3JlYXRlZCI6IjIwMjQtMDItMjNUMDQ6NDE6MjUuNjE1WiIsInVwZGF0ZWQiOiIyMDI0LTAyLTIzVDA0OjQxOjI1LjYxNVoiLCJ0aW1lU3RvcmFnZUNsYXNzVXBkYXRlZCI6IjIwMjQtMDItMjNUMDQ6NDE6MjUuNjE1WiJ9"
103c103
<       "ID": "cde3e48f8ec60e44",
---
>       "ID": "d6df0cec23facc54",
139c139
<             "text/html; charset=utf-8"
---
>             "application/octet-stream"
142c142
<             "Fri, 23 Feb 2024 04:39:20 GMT"
---
>             "Fri, 23 Feb 2024 04:41:25 GMT"
151c151
<             "Fri, 23 Feb 2024 04:39:20 GMT"
---
>             "Fri, 23 Feb 2024 04:41:25 GMT"
157c157
<             "1708663160206161"
---
>             "1708663285579958"
176c176
<             "ABPtcPp9XU3hLfwVwoC1vJdjJn7zRorTQU2HOwTeCBzWVYLPRNjPJr345BBGYYhwF2L1NETKgQ"
---
>             "ABPtcPrchhnqFBhXE0m-ouetvWbS7oum-Yb6fgfFr5SNHrAj7ZJHCXNWTSg10IEFyO-FwHihi_wxYRRDfw"
183c183
<       "ID": "d020bd5d188ff050",
---
>       "ID": "93892b86d06d0e26",
222c222
<             "Fri, 23 Feb 2024 04:39:20 GMT"
---
>             "Fri, 23 Feb 2024 04:41:26 GMT"
238c238
<             "ABPtcPr4C4qeJPyF1Hs355ngmuBi86yohwEJAjwNpot2D32-YoWdjJw1MFzIPpH05tJd7MOZRg"
---
>             "ABPtcPo4ozm6fchsDbEgNQzPavrkXLk314QaszAfZssPn9pGFsQAOM8Rrwm1Zyn39NKjIOX6xtIagqF3qg"

The two key differences are in Body and the response Content-Type. It's interesting that application/octet-stream is now being returned instead of text/html; charset=utf-8, but the Sadly we can't really verify this comment still applies.

Body (before)

{
   "kind":"storage#object",
   "id":"stanhu-test/blob-for-reading/1708663160206161",
   "selfLink":"https://www.googleapis.com/storage/v1/b/stanhu-test/o/blob-for-reading",
   "mediaLink":"https://storage.googleapis.com/download/storage/v1/b/stanhu-test/o/blob-for-reading?generation=1708663160206161&alt=media",
   "name":"blob-for-reading",
   "bucket":"stanhu-test",
   "generation":"1708663160206161",
   "metageneration":"1",
   "contentType":"text/html; charset=utf-8",
   "storageClass":"STANDARD",
   "size":"6354",
   "md5Hash":"D2p8n/PT13C49/oZwq4Z0A==",
   "crc32c":"tdwrgQ==",
   "etag":"CNHm0aXSwIQDEAE=",
   "timeCreated":"2024-02-23T04:39:20.256Z",
   "updated":"2024-02-23T04:39:20.256Z",
   "timeStorageClassUpdated":"2024-02-23T04:39:20.256Z"
}

Body (after)

{
  "kind": "storage#object",
  "id": "stanhu-test/blob-for-reading/1708663285579958",
  "selfLink": "https://www.googleapis.com/storage/v1/b/stanhu-test/o/blob-for-reading",
  "mediaLink": "https://storage.googleapis.com/download/storage/v1/b/stanhu-test/o/blob-for-reading?generation=1708663285579958&alt=media",
  "name": "blob-for-reading",
  "bucket": "stanhu-test",
  "generation": "1708663285579958",
  "metageneration": "1",
  "storageClass": "STANDARD",
  "size": "6354",
  "md5Hash": "D2p8n/PT13C49/oZwq4Z0A==",
  "crc32c": "tdwrgQ==",
  "etag": "CLaBtuHSwIQDEAE=",
  "timeCreated": "2024-02-23T04:41:25.615Z",
  "updated": "2024-02-23T04:41:25.615Z",
  "timeStorageClassUpdated": "2024-02-23T04:41:25.615Z"
}

You can see contentType has been omitted in the second case.

@stanhu stanhu force-pushed the sh-gcsblob-disable-content-type-detection branch from 53a0705 to 74b5319 Compare February 23, 2024 05:07
@stanhu
Copy link
Contributor Author

stanhu commented Feb 23, 2024

I also see that googleapis/google-cloud-go#9211 introduced the includeFoldersAsPrefixes=false query string in the URLs, so this is the main reason why the GCS suite is not working with the updated version.

stanhu added a commit to stanhu/go-cloud that referenced this pull request Feb 23, 2024
This is needed in preparation for google#3381.

googleapis/google-cloud-go#9211 introduced the
`includeFoldersAsPrefixes=false` query string in the URLs.
stanhu added a commit to stanhu/go-cloud that referenced this pull request Feb 23, 2024
This is needed in preparation for google#3381.

googleapis/google-cloud-go#9211 introduced the
`includeFoldersAsPrefixes=false` query string in the URLs.
@vangent
Copy link
Contributor

vangent commented Feb 25, 2024

As noted on the other PR, let's wait until there is a released version with the change you need, then ping me here and I'll update the dependency version.

Copy link
Contributor

@vangent vangent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've submitted a PR to update dependencies, so you should be able to go ahead with this now.

blob/blob.go Show resolved Hide resolved
blob/driver/driver.go Outdated Show resolved Hide resolved
stanhu added a commit to stanhu/go-cloud that referenced this pull request Feb 29, 2024
@stanhu stanhu force-pushed the sh-gcsblob-disable-content-type-detection branch from 74b5319 to 2132176 Compare February 29, 2024 22:32
Copy link

codecov bot commented Feb 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.15%. Comparing base (6ca965e) to head (71bf492).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3381   +/-   ##
=======================================
  Coverage   73.15%   73.15%           
=======================================
  Files         113      113           
  Lines       14846    14848    +2     
=======================================
+ Hits        10860    10862    +2     
  Misses       3213     3213           
  Partials      773      773           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stanhu
Copy link
Contributor Author

stanhu commented Feb 29, 2024

@vangent Thanks for your help! I think this is ready to go.

@stanhu stanhu requested a review from vangent February 29, 2024 23:05
Copy link
Contributor

@vangent vangent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the PR to drop all of the go.mod and go.sum changes? IIUC those aren't necessary anymore.

@stanhu
Copy link
Contributor Author

stanhu commented Mar 1, 2024

@vangent
Copy link
Contributor

vangent commented Mar 1, 2024

Hmm, I thought you said that the change you needed was released, and then I updated dependencies to the latest. Did something else change, or ...? I'd rather have the dependency updates separate from code changes.

@stanhu
Copy link
Contributor Author

stanhu commented Mar 1, 2024

@vangent Sorry, I should have clarified. I proposed updating to cloud.google.com/go/storage v1.38.0 first in #3382 because all the replay files needed to be regenerated for v1.38.0.

v1.39.0 was just released yesterday with the change this pull request needs with the ForceEmptyContentType flag. No replay files need to be regenerated.

@vangent
Copy link
Contributor

vangent commented Mar 1, 2024

Ah, OK. I will submit a separate PR to update that dependency, after which you should be able to rebase this one and have it just be the code change.

Do Azure/AWS not have any way to disable any content-type detection they are doing?

@stanhu
Copy link
Contributor Author

stanhu commented Mar 1, 2024

Ah, OK. I will submit a separate PR to update that dependency, after which you should be able to rebase this one and have it just be the code change.

Thanks.

Do Azure/AWS not have any way to disable any content-type detection they are doing?

Azure appears to set the Content-Type to application/octet-stream if ContentType is left empty. I'm not sure if the client or server is doing that.

I'm going to test with AWS, but a cursory scan did not show any Content-Type detection in the Go client.

google#3371 made it possible to
disable Go Cloud's Content-Type auto-detection via the
`DisableContentTypeDetection` option. However, the Google Cloud
Storage (GCS) driver still performed auto-detection even if this
option were enabled, so it was previously not possible to keep
`Content-Type` blank. This commit adds the
`DisableContentTypeDetection` option to the the driver and passes
along the value to make it possible to keep Content-Type blank in the
GCS object metadata. This enables the use of the
`Response-Content-Type` override in a signed URL.

This commit needs cloud.google.com/go/storage v1.39
(googleapis/google-cloud-go#9431).
@stanhu stanhu force-pushed the sh-gcsblob-disable-content-type-detection branch from 2132176 to 71bf492 Compare March 1, 2024 19:17
@stanhu
Copy link
Contributor Author

stanhu commented Mar 1, 2024

I confirmed that AWS leaves the Content-Type blank. I applied this diff to examine the HTTP traffic:

diff --git a/aws/aws.go b/aws/aws.go
index 55dc3864..478d11fc 100644
--- a/aws/aws.go
+++ b/aws/aws.go
@@ -83,6 +83,7 @@ func (co ConfigOverrider) ClientConfig(serviceName string, cfgs ...*aws.Config)
 //   - s3ForcePathStyle: A value of "true" forces the request to use path-style addressing; sets aws.Config.S3ForcePathStyle.
 func ConfigFromURLParams(q url.Values) (*aws.Config, error) {
 	var cfg aws.Config
+	cfg.LogLevel = aws.LogLevel(aws.LogDebug)
 	for param, values := range q {
 		value := values[0]
 		switch param {
diff --git a/samples/gocdk-blob/main.go b/samples/gocdk-blob/main.go
index 2d0449e9..b92978e4 100644
--- a/samples/gocdk-blob/main.go
+++ b/samples/gocdk-blob/main.go
@@ -192,7 +192,8 @@ func (*uploadCmd) Execute(ctx context.Context, f *flag.FlagSet, _ ...interface{}
 	defer bucket.Close()
 
 	// Open a *blob.Writer for the blob at blobKey.
-	writer, err := bucket.NewWriter(ctx, blobKey, nil)
+	opts := blob.WriterOptions{DisableContentTypeDetection: true}
+	writer, err := bucket.NewWriter(ctx, blobKey, &opts)
 	if err != nil {
 		log.Printf("Failed to write %q: %v\n", blobKey, err)
 		return subcommands.ExitFailure

Then I used gocdk-blob to upload a file:

% cat main.go | ./gocdk-blob upload "s3://redacted?region=us-west-2" main.go
2024/03/01 12:16:51 DEBUG: Request s3/PutObject Details:
---[ REQUEST POST-SIGN ]-----------------------------
PUT /main.go HTTP/1.1
Host: redacted.s3.us-west-2.amazonaws.com
User-Agent: aws-sdk-go/1.50.29 (go1.21.6; darwin; arm64) S3Manager
Content-Length: 5797
Authorization: AWS4-HMAC-SHA256 Credential=REDACTED/20240301/us-west-2/s3/aws4_request, SignedHeaders=content-length;content-md5;content-type;host;x-amz-content-sha256;x-amz-date, Signature=37e7099dfb6c02fa2555d9d508513536a91d8faf04cd437a433b0394ac3e3643
Content-Md5: l6jCdv1TY7ep5kuifTwxVA==
Content-Type:
X-Amz-Content-Sha256: 8700113c483b76e7175caedbd5fc2d6bb2b08f8914c4c520b0efa893120d3c4a
X-Amz-Date: 20240301T201651Z
Accept-Encoding: gzip


-----------------------------------------------------
2024/03/01 12:16:51 DEBUG: Response s3/PutObject Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
Content-Length: 0
Date: Fri, 01 Mar 2024 20:16:52 GMT
Etag: "97a8c276fd5363b7a9e64ba27d3c3154"
Server: AmazonS3
X-Amz-Id-2: n95qpqGNvGT+GWYs/5aTI+o7jx0I1QyT9nXHpdjtxOFh1QKfDkximTvyY67hCK//X2tbB+wyoIA=
X-Amz-Request-Id: EC0NM5CJ05CH09D6
X-Amz-Server-Side-Encryption: AES256


-----------------------------------------------------

The download confirmed a blank Content-Type:

% ./gocdk-blob download "s3://redacted?region=us-west-2" main.go
2024/03/01 12:17:45 DEBUG: Request s3/GetObject Details:
---[ REQUEST POST-SIGN ]-----------------------------
GET /main.go HTTP/1.1
Host: redacted.s3.us-west-2.amazonaws.com
User-Agent: aws-sdk-go/1.50.29 (go1.21.6; darwin; arm64)
Authorization: AWS4-HMAC-SHA256 Credential=REDACTED/20240301/us-west-2/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=f2dfc90a65eaa66a5d63d220198cc4e0ab6c92f088d1f0ddd09ce32ab08e49fa
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240301T201745Z
Accept-Encoding: gzip


-----------------------------------------------------
2024/03/01 12:17:45 DEBUG: Response s3/GetObject Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
Content-Length: 5797
Accept-Ranges: bytes
Content-Type:
Date: Fri, 01 Mar 2024 20:17:46 GMT
Etag: "97a8c276fd5363b7a9e64ba27d3c3154"
Last-Modified: Fri, 01 Mar 2024 20:16:52 GMT
Server: AmazonS3
X-Amz-Id-2: rIHHlRV+xYFlIxDe0/7FlwEP5b4eJF5e7vPsZIc8Hj4fvDGumNBva45GqBQaIYaGHHo2kNXqHRM=
X-Amz-Request-Id: 2Z3KQM9GET3JDAMS
X-Amz-Server-Side-Encryption: AES256

@vangent vangent merged commit 7750aa6 into google:master Mar 1, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants