Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not initialize class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemConfiguration #1233

Open
lvijnck opened this issue Aug 1, 2024 · 1 comment

Comments

@lvijnck
Copy link

lvijnck commented Aug 1, 2024

I'm unable to setup the connector through maven according to the docs:

The following works, unfortunately it's slow as it re-downloads the jar every time.

spark.jars: https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop3-latest.jar
spark.hadoop.fs.gs.impl: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
spark.hadoop.fs.AbstractFileSystem.gs.impl: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS

spark.hadoop.google.cloud.auth.service.account.enable: true
spark.hadoop.google.cloud.auth.service.account.json.keyfile: conf/local/service-account.json

Hence why we would like to maven, but we're unable to get that working

spark.jars.packages: com.google.cloud.bigdataoss:gcs-connector:hadoop3-2.2.19
spark.hadoop.fs.gs.impl: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
spark.hadoop.fs.AbstractFileSystem.gs.impl: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS

spark.hadoop.google.cloud.auth.service.account.enable: true
spark.hadoop.google.cloud.auth.service.account.json.keyfile: conf/local/service-account.json

This gives the following error:

: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3467)
        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
        at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$1(DataSource.scala:724)
        at scala.collection.immutable.List.map(List.scala:293)
        at org.apache.spark.sql.execution.datasources.DataSource$.checkAndGlobPathIfNecessary(DataSource.scala:722)
        at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:551)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:404)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:229)
        at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:211)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:186)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:135)
        ... 28 more
Caused by: java.lang.NoClassDefFoundError: Could not initialize class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemConfiguration
        at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.<init>(GoogleHadoopFileSystemBase.java:250)
        at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.<init>(GoogleHadoopFileSystem.java:58)
        ... 33 more
@lvijnck lvijnck changed the title Errors importing the maven provided package Could not initialize class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemConfiguration Aug 1, 2024
@lvijnck
Copy link
Author

lvijnck commented Aug 1, 2024

Shaded JAR from here works:

https://repo1.maven.org/maven2/com/google/cloud/bigdataoss/gcs-connector/hadoop3-2.0.0/

not sure how to get it from Maven though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant