Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use S3 to generate checksum #4873

Merged
merged 5 commits into from
Apr 19, 2024

Conversation

shinyhappydan
Copy link
Contributor

@shinyhappydan shinyhappydan commented Apr 17, 2024

Fixes #4860

.build(),
AsyncRequestBody.fromByteBuffer(bs)
).map { response =>
response.eTag().filter(_ != '"')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the filter, did you see it return that value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes it returns the MD5 quoted 😱

@@ -28,7 +28,7 @@ trait S3StorageClient {

def getFileAttributes(bucket: String, key: String): IO[GetObjectAttributesResponse]

def underlyingClient: S3[IO]
def underlyingClient: S3AsyncClientOp[IO]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change was a step towards removing the client from the interface, maybe the uploadFile could go behind it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do, but I was a but reluctant to return a pipe, or do something which consumes the stream. I think for now this is fine but certainly something that we could revisit?

@shinyhappydan shinyhappydan force-pushed the s3-checksum-t2 branch 2 times, most recently from 8561c91 to f02eddf Compare April 18, 2024 08:33
case "MD5" => response.eTag().stripPrefix("\"").stripSuffix("\"")
case "SHA-256" => Hex.encodeHexString(Base64.getDecoder.decode(response.checksumSHA256()))
case "SHA-1" => Hex.encodeHexString(Base64.getDecoder.decode(response.checksumSHA1()))
case _ => throw new IllegalArgumentException(s"Unsupported algorithm for S3: ${algorithm.value}")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not ideal we have to do this, but the algorithm is validated upon config loading, so this code path should never be hit

Comment on lines +163 to 171
implicit class PutObjectRequestOps(request: PutObjectRequest.Builder) {
def deltaDigest(algorithm: DigestAlgorithm): PutObjectRequest.Builder =
algorithm.value match {
case "MD5" => request
case "SHA-256" => request.checksumAlgorithm(ChecksumAlgorithm.SHA256)
case "SHA-1" => request.checksumAlgorithm(ChecksumAlgorithm.SHA1)
case _ => throw new IllegalArgumentException(s"Unsupported algorithm for S3: ${algorithm.value}")
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unusual to use an implicit class here, but given that the request builder is mutable it makes sense we do this to avoid passing the builder between methods

source <- fileOps.fetch(bucket, attr.path)
} yield consume(source)

assertIO(result, content)
}
}

test("Use MD5 to calculate a checksum") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: the previous test checks SHA-256, this is just an additional test because the code path is different

@shinyhappydan shinyhappydan merged commit 82238c6 into BlueBrain:master Apr 19, 2024
4 checks passed
@shinyhappydan shinyhappydan deleted the s3-checksum-t2 branch April 19, 2024 11:03
algorithm: DigestAlgorithm
): IO[(String, Long)] = {
for {
fileSizeAcc <- Ref.of[IO, Long](0L)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to get the file size from S3 no ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Only allow S3 supported digest algorithms in S3 storages
4 participants