Allow different GCP credentials for reading and writing in the same Spark Session #1249

sid-habu · 2024-09-12T21:13:13Z

S3 and ADLS connectors allow setting credentials per bucket in the same Spark session. However, I am unable to find a similar config for GCS.

My use case is to read from a GCS bucket in a project and write to a GCS bucket in another project with a different set of credentials. Overriding the Hadoop conf just before writing causes failures due to race conditions in the token caching.

Is there a workaround to achieve this in Spark for GCS?

sid-habu · 2024-10-18T19:01:31Z

We resolved this by extending GoogleHadoopFileSystem and overriding initialize method to read our custom auth properties and set fs.gs.auth for the URI representing the GCS bucket.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow different GCP credentials for reading and writing in the same Spark Session #1249

Allow different GCP credentials for reading and writing in the same Spark Session #1249

sid-habu commented Sep 12, 2024

sid-habu commented Oct 18, 2024

Allow different GCP credentials for reading and writing in the same Spark Session #1249

Allow different GCP credentials for reading and writing in the same Spark Session #1249

Comments

sid-habu commented Sep 12, 2024

sid-habu commented Oct 18, 2024