Skip to content

Commit

Permalink
Modify spark image to include azure and gcs dependency jars
Browse files Browse the repository at this point in the history
  • Loading branch information
anusudarsan authored and ebyhr committed May 24, 2024
1 parent 31ec4ad commit a4f6a1f
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions testing/spark3-iceberg/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,12 @@ WORKDIR ${SPARK_HOME}/jars
# install AWS SDK so we can access S3; the version must match the hadoop-* jars which are part of SPARK distribution
RUN wget -nv "https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.4/hadoop-aws-3.3.4.jar"
RUN wget -nv "https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.319/aws-java-sdk-bundle-1.12.319.jar"
# install Azure SDK so we can access azure file system; the version must match the hadoop-* jars which are part of SPARK distribution
RUN wget -nv https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-azure/3.3.4/hadoop-azure-3.3.4.jar
RUN wget -nv https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-azure-datalake/3.3.6/hadoop-azure-datalake-3.3.6.jar
RUN wget -nv https://repo1.maven.org/maven2/com/microsoft/azure/azure-storage/8.6.6/azure-storage-8.6.6.jar
# install Google Hadoop connector so we can access gcs
RUN wget -nv https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop2-latest.jar

# install Iceberg
RUN wget -nv "https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-${ICEBERG_JAR_VERSION}/${ICEBERG_VERSION}/iceberg-spark-runtime-${ICEBERG_JAR_VERSION}-${ICEBERG_VERSION}.jar"
Expand Down

0 comments on commit a4f6a1f

Please sign in to comment.