diff --git a/README.md b/README.md index 5cf0679..fbbf9c3 100644 --- a/README.md +++ b/README.md @@ -1,32 +1,30 @@ ## Spark on Kubernetes benchmark utility -This repository is used to benchmark Spark performance on Kubernetes. +This repository provides a general tool to benchmark Spark performance. If you want to use the [prebuild docker image](https://github.com/aws-samples/emr-on-eks-benchmark/pkgs/container/emr-on-eks-benchmark) based on a prebuild OSS spark_3.1.2_hadoop_3.3.1, you can skip the [build section](#Build-benchmark-utility-docker-image) and jump to [Run Benchmark](#Run-Benchmark) directly. If you want to build your own, follow the steps in the [build section](#Build-benchmark-utility-docker-image). ## Prerequisite -- eksctl is installed +- eksctl is installed ( >= 0.143.0) ```bash curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp sudo mv -v /tmp/eksctl /usr/local/bin eksctl version ``` -- Update AWS CLI to the latest (requires aws cli version >= 2.1.14) on macOS. Check out the [link](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) for Linux or Windows +- Update AWS CLI to the latest (requires aws cli version >= 2.11.23) on macOS. Check out the [link](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) for Linux or Windows ```bash curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg" sudo installer -pkg ./AWSCLIV2.pkg -target / aws --version rm AWSCLIV2.pkg ``` -- Install kubectl on macOS, check out the [link](https://kubernetes.io/docs/tasks/tools/) for Linux or Windows. +- Install kubectl on macOS, check out the [link](https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/) for Linux or Windows.( >= 1.26.4 ) ```bash -curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/darwin/amd64/kubectl" -chmod +x ./kubectl -sudo mv ./kubectl /usr/local/bin/kubectl && export PATH=/usr/local/bin:$PATH -sudo chown root: /usr/local/bin/kubectl +curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" +sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl kubectl version --short --client ``` -- Helm CLI +- Helm CLI ( >= 3.2.1 ) ```bash curl -sSL https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash helm version --short @@ -64,7 +62,7 @@ aws ecr create-repository --repository-name spark --image-scanning-configuration docker build -t $ECR_URL/spark:3.1.2_hadoop_3.3.1 -f docker/hadoop-aws-3.3.1/Dockerfile --build-arg HADOOP_VERSION=3.3.1 --build-arg SPARK_VERSION=3.1.2 . docker push $ECR_URL/spark:3.1.2_hadoop_3.3.1 -# Build benchmark utility based on the Spark +# Build benchmark utility based on the Spark docker build -t $ECR_URL/eks-spark-benchmark:3.1.2 -f docker/benchmark-util/Dockerfile --build-arg SPARK_BASE_IMAGE=$ECR_URL/spark:3.1.2_hadoop_3.3.1 . ``` @@ -116,20 +114,31 @@ bash examples/emr6.5-benchmark.sh ``` ### Benchmark for EMR on EC2 Few notes for the set up: -1. Use the same instance type c5d.9xlarge as in the EKS cluster. -2. If choosing an EBS-backed instance, check the [default instance storage setting](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-storage.html) by EMR on EC2, and attach the same number of EBS volumes to your EKS cluster before running EKS related benchmarks. +1. Use the same instance type c5d.9xlarge as in the EKS cluster. +2. If choosing an EBS-backed instance, check the [default instance storage setting](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-storage.html) by EMR on EC2, and attach the same number of EBS volumes to your EKS cluster before running EKS related benchmarks. -The benchmark utility app was compiled to a jar file during an [automated GitHub workflow](https://github.com/aws-samples/emr-on-eks-benchmark/actions/workflows/relase-package.yaml) process. The quickest way to get the jar is from a running Kubernetes container. +The benchmark utility app was compiled to a jar file during an [automated GitHub workflow](https://github.com/aws-samples/emr-on-eks-benchmark/actions/workflows/relase-package.yaml) process. If you already have a running Kubernetes container, the quickest way to get the jar is using `kubectl cp` command as shown below: ```bash # Download the jar and ignore the warning message kubectl cp oss/oss-spark-tpcds-exec-1:/opt/spark/examples/jars/eks-spark-benchmark-assembly-1.0.jar eks-spark-benchmark-assembly-1.0.jar +``` + +However if you are running a benchmark just for EMR on EC2, you probably don\'t have a running container. To copy the jar file from a docker container, you need two terminals. In the first terminal, spin up a docker container based on your image built: +```bash +docker run --name spark-benchmark -it $ECR_URL/eks-spark-benchmark:3.1.2 bash +# you are logged in to the container now, find the jar file +hadoop@9ca5b2afe778: ls -alh /opt/spark/examples/jars/eks-spark-benchmark-assembly-1.0.jar +``` +Keep the container running then go to the second terminal, run the command to copy the jar file from the container to your local directory: +```bash +docker cp spark-benchmark:/opt/spark/examples/jars/eks-spark-benchmark-assembly-1.0.jar . # Upload to s3 S3BUCKET= aws s3 cp eks-spark-benchmark-assembly-1.0.jar s3://$S3BUCKET ``` -Submit the benchmark job via EMR Step on the AWS console. Make sure the EMR on EC2 cluster can access the `$S3BUCKET`: +Submit the benchmark job via EMR Step on the AWS console. Make sure the EMR on EC2 cluster can access the `$S3BUCKET`: ```bash # Step type: Spark Application # JAR location: s3://$S3BUCKET/eks-spark-benchmark-assembly-1.0.jar diff --git a/docker/benchmark-util/.gitignore b/docker/benchmark-util/.gitignore new file mode 100644 index 0000000..1bcb62a --- /dev/null +++ b/docker/benchmark-util/.gitignore @@ -0,0 +1,21 @@ +*.DS_Store +*.class +*.log +*.pyc +sbt/*.jar +.idea +.idea_modules +*.iml + +# sbt specific +build/*.jar +.cache/ +.history/ +.lib/ +dist/* +target/ +lib_managed/ +src_managed/ +project/boot/ +project/plugins/project/ +performance/ diff --git a/docker/benchmark-util/Dockerfile b/docker/benchmark-util/Dockerfile index 1aca45f..8dbbda2 100644 --- a/docker/benchmark-util/Dockerfile +++ b/docker/benchmark-util/Dockerfile @@ -13,6 +13,7 @@ RUN yum update -y && \ make OS=LINUX + FROM mozilla/sbt:8u292_1.5.4 as sbtenv # Build the Databricks SQL perf library from the local Spark version @@ -35,5 +36,4 @@ COPY --from=tpc-toolkit /tmp/tpcds-kit/tools /opt/tpcds-kit/tools COPY --from=sbtenv /tmp/emr-on-eks-benchmark/benchmark/target/scala-2.12/*jar ${SPARK_HOME}/examples/jars/ # # Use hadoop user and group -USER hadoop:hadoop - +USER hadoop:hadoop \ No newline at end of file diff --git a/docker/emr-jdk11/Dockerfile b/docker/emr-jdk11/Dockerfile new file mode 100644 index 0000000..7dcfa96 --- /dev/null +++ b/docker/emr-jdk11/Dockerfile @@ -0,0 +1,12 @@ +FROM 021732063925.dkr.ecr.us-west-2.amazonaws.com/eks-spark-benchmark:emr6.10_jdk8 +USER root +ENV JAVA_HOME /etc/alternatives/jre + +RUN rpm -qa | grep corretto | xargs yum -y remove \ +# to keep hadoop-lzo dependency +&& rpm -e --nodeps java-1.8.0-openjdk-headless \ +&& amazon-linux-extras install java-openjdk11 \ +&& yum clean all +RUN alternatives --set java /usr/lib/jvm/$(ls /usr/lib/jvm | grep java-11 | cut -f 3)/bin/java +# # Use hadoop user and group +USER hadoop:hadoop \ No newline at end of file diff --git a/docker/emr-jdk11/Dockerfile_corretto b/docker/emr-jdk11/Dockerfile_corretto new file mode 100644 index 0000000..ed11aaa --- /dev/null +++ b/docker/emr-jdk11/Dockerfile_corretto @@ -0,0 +1,19 @@ +FROM 021732063925.dkr.ecr.us-west-2.amazonaws.com/eks-spark-benchmark:emr6.10_jdk8 +USER root + +# RUN amazon-linux-extras enable nginx1 \ +# && rpm --import https://yum.corretto.aws/corretto.key \ +# && curl -L -o /etc/yum.repos.d/corretto.repo https://yum.corretto.aws/corretto.repo +RUN yum update -y \ +&& amazon-linux-extras disable corretto8 \ +# && rpm -qa | grep -E "openjdk|corretto" | xargs yum -y remove \ +&& rpm -qa | grep corretto | xargs yum -y remove \ +# to keep hadoop-lzo dependency +&& rpm -e --nodeps java-1.8.0-openjdk-headless \ +&& yum install -y java-11-amazon-corretto \ +&& yum clean all + + +# RUN alternatives --set java /usr/lib/jvm/$(ls /usr/lib/jvm | grep corretto | cut -f 3)/bin/java +# # Use hadoop user and group +USER hadoop:hadoop \ No newline at end of file diff --git a/examples/emr6.10-benchmark_c5.sh b/examples/emr6.10-benchmark_c5.sh new file mode 100755 index 0000000..f18a0d2 --- /dev/null +++ b/examples/emr6.10-benchmark_c5.sh @@ -0,0 +1,53 @@ +#!/bin/bash +# SPDX-FileCopyrightText: Copyright 2021 Amazon.com, Inc. or its affiliates. +# SPDX-License-Identifier: MIT-0 + +# cross account test +# "spark.hadoop.fs.s3.bucket.emr-eks-demo-720560070661-us-east-1.customAWSCredentialsProvider": "com.amazonaws.emr.AssumeRoleAWSCredentialsProvider", +# "spark.kubernetes.driverEnv.ASSUME_ROLE_CREDENTIALS_ROLE_ARN": "arn:aws:iam::720560070661:role/EMRContainers-JobExecutionRole", +# "spark.executorEnv.ASSUME_ROLE_CREDENTIALS_ROLE_ARN": "arn:aws:iam::720560070661:role/EMRContainers-JobExecutionRole" + +# export EMRCLUSTER_NAME=emr-on-eks-rss +# export AWS_REGION=us-east-1 +export ACCOUNTID=$(aws sts get-caller-identity --query Account --output text) +export VIRTUAL_CLUSTER_ID=$(aws emr-containers list-virtual-clusters --query "virtualClusters[?name == '$EMRCLUSTER_NAME' && state == 'RUNNING'].id" --output text) +export EMR_ROLE_ARN=arn:aws:iam::$ACCOUNTID:role/$EMRCLUSTER_NAME-execution-role +export S3BUCKET=$EMRCLUSTER_NAME-$ACCOUNTID-$AWS_REGION +export ECR_URL="$ACCOUNTID.dkr.ecr.$AWS_REGION.amazonaws.com" + +aws emr-containers start-job-run \ + --virtual-cluster-id $VIRTUAL_CLUSTER_ID \ + --name emr610-JDK8 \ + --execution-role-arn $EMR_ROLE_ARN \ + --release-label emr-6.9.0-latest \ + --retry-policy-configuration '{"maxAttempts": 5}' \ + --job-driver '{ + "sparkSubmitJobDriver": { + "entryPoint": "local:///usr/lib/spark/examples/jars/eks-spark-benchmark-assembly-1.0.jar", + "entryPointArguments":["s3://'$S3BUCKET'/BLOG_TPCDS-TEST-3T-partitioned","s3://'$S3BUCKET'/JDK_EMRONEKS_TPCDS-TEST-3T-RESULT","/opt/tpcds-kit/tools","parquet","3000","1","false","q1-v2.4,q10-v2.4,q11-v2.4,q12-v2.4,q13-v2.4,q14a-v2.4,q14b-v2.4,q15-v2.4,q16-v2.4,q17-v2.4,q18-v2.4,q19-v2.4,q2-v2.4,q20-v2.4,q21-v2.4,q22-v2.4,q23a-v2.4,q23b-v2.4,q24a-v2.4,q24b-v2.4,q25-v2.4,q26-v2.4,q27-v2.4,q28-v2.4,q29-v2.4,q3-v2.4,q30-v2.4,q31-v2.4,q32-v2.4,q33-v2.4,q34-v2.4,q35-v2.4,q36-v2.4,q37-v2.4,q38-v2.4,q39a-v2.4,q39b-v2.4,q4-v2.4,q40-v2.4,q41-v2.4,q42-v2.4,q43-v2.4,q44-v2.4,q45-v2.4,q46-v2.4,q47-v2.4,q48-v2.4,q49-v2.4,q5-v2.4,q50-v2.4,q51-v2.4,q52-v2.4,q53-v2.4,q54-v2.4,q55-v2.4,q56-v2.4,q57-v2.4,q58-v2.4,q59-v2.4,q6-v2.4,q60-v2.4,q61-v2.4,q62-v2.4,q63-v2.4,q64-v2.4,q65-v2.4,q66-v2.4,q67-v2.4,q68-v2.4,q69-v2.4,q7-v2.4,q70-v2.4,q71-v2.4,q72-v2.4,q73-v2.4,q74-v2.4,q75-v2.4,q76-v2.4,q77-v2.4,q78-v2.4,q79-v2.4,q8-v2.4,q80-v2.4,q81-v2.4,q82-v2.4,q83-v2.4,q84-v2.4,q85-v2.4,q86-v2.4,q87-v2.4,q88-v2.4,q89-v2.4,q9-v2.4,q90-v2.4,q91-v2.4,q92-v2.4,q93-v2.4,q94-v2.4,q95-v2.4,q96-v2.4,q97-v2.4,q98-v2.4,q99-v2.4,ss_max-v2.4","true"], + "sparkSubmitParameters": "--class com.amazonaws.eks.tpcds.BenchmarkSQL --conf spark.driver.cores=4 --conf spark.driver.memory=5g --conf spark.executor.cores=4 --conf spark.executor.memory=6g --conf spark.executor.instances=47"}}' \ + --configuration-overrides '{ + "applicationConfiguration": [ + { + "classification": "spark-defaults", + "properties": { + "spark.kubernetes.container.image": "'$ECR_URL'/eks-spark-benchmark:emr6.10_jdk8", + "spark.kubernetes.driver.podTemplateFile": "s3://'$S3BUCKET'/app_code/pod-template/driver-pod-template.yaml", + "spark.kubernetes.executor.podTemplateFile": "s3://'$S3BUCKET'/app_code/pod-template/executor-pod-template.yaml", + "spark.kubernetes.driver.limit.cores": "4.1", + "spark.kubernetes.executor.limit.cores": "4.3", + "spark.driver.memoryOverhead": "1000", + "spark.executor.memoryOverhead": "2G", + "spark.network.timeout": "2000s", + "spark.executor.heartbeatInterval": "300s", + "spark.kubernetes.node.selector.eks.amazonaws.com/nodegroup": "c59d" + }}, + { + "classification": "spark-log4j", + "properties": { + "rootLogger.level" : "WARN" + } + } + ], + "monitoringConfiguration": { + "s3MonitoringConfiguration": {"logUri": "s3://'$S3BUCKET'/elasticmapreduce/emr-containers"}}}' diff --git a/examples/emr6.6-benchmark_c5.sh b/examples/emr6.6-benchmark_c5.sh deleted file mode 100755 index 5e3fcef..0000000 --- a/examples/emr6.6-benchmark_c5.sh +++ /dev/null @@ -1,53 +0,0 @@ -#!/bin/bash -# SPDX-FileCopyrightText: Copyright 2021 Amazon.com, Inc. or its affiliates. -# SPDX-License-Identifier: MIT-0 - -# export EMRCLUSTER_NAME=emr-on-eks-nvme -# export AWS_REGION=us-east-1 -export ACCOUNTID=$(aws sts get-caller-identity --query Account --output text) -export VIRTUAL_CLUSTER_ID=$(aws emr-containers list-virtual-clusters --query "virtualClusters[?name == '$EMRCLUSTER_NAME' && state == 'RUNNING'].id" --output text) -export EMR_ROLE_ARN=arn:aws:iam::$ACCOUNTID:role/$EMRCLUSTER_NAME-execution-role -export S3BUCKET=$EMRCLUSTER_NAME-$ACCOUNTID-$AWS_REGION -export ECR_URL="$ACCOUNTID.dkr.ecr.$AWS_REGION.amazonaws.com" - -aws emr-containers start-job-run \ - --virtual-cluster-id $VIRTUAL_CLUSTER_ID \ - --name em66-c5-4xl \ - --execution-role-arn $EMR_ROLE_ARN \ - --release-label emr-6.6.0-latest \ - --job-driver '{ - "sparkSubmitJobDriver": { - "entryPoint": "local:///usr/lib/spark/examples/jars/eks-spark-benchmark-assembly-1.0.jar", - "entryPointArguments":["s3://'$S3BUCKET'/BLOG_TPCDS-TEST-3T-partitioned","s3://'$S3BUCKET'/EMRONEKS_TPCDS-TEST-3T-RESULT","/opt/tpcds-kit/tools","parquet","3000","3","false","q1-v2.4,q10-v2.4,q11-v2.4,q12-v2.4,q13-v2.4,q14a-v2.4,q14b-v2.4,q15-v2.4,q16-v2.4,q17-v2.4,q18-v2.4,q19-v2.4,q2-v2.4,q20-v2.4,q21-v2.4,q22-v2.4,q23a-v2.4,q23b-v2.4,q24a-v2.4,q24b-v2.4,q25-v2.4,q26-v2.4,q27-v2.4,q28-v2.4,q29-v2.4,q3-v2.4,q30-v2.4,q31-v2.4,q32-v2.4,q33-v2.4,q34-v2.4,q35-v2.4,q36-v2.4,q37-v2.4,q38-v2.4,q39a-v2.4,q39b-v2.4,q4-v2.4,q40-v2.4,q41-v2.4,q42-v2.4,q43-v2.4,q44-v2.4,q45-v2.4,q46-v2.4,q47-v2.4,q48-v2.4,q49-v2.4,q5-v2.4,q50-v2.4,q51-v2.4,q52-v2.4,q53-v2.4,q54-v2.4,q55-v2.4,q56-v2.4,q57-v2.4,q58-v2.4,q59-v2.4,q6-v2.4,q60-v2.4,q61-v2.4,q62-v2.4,q63-v2.4,q64-v2.4,q65-v2.4,q66-v2.4,q67-v2.4,q68-v2.4,q69-v2.4,q7-v2.4,q70-v2.4,q71-v2.4,q72-v2.4,q73-v2.4,q74-v2.4,q75-v2.4,q76-v2.4,q77-v2.4,q78-v2.4,q79-v2.4,q8-v2.4,q80-v2.4,q81-v2.4,q82-v2.4,q83-v2.4,q84-v2.4,q85-v2.4,q86-v2.4,q87-v2.4,q88-v2.4,q89-v2.4,q9-v2.4,q90-v2.4,q91-v2.4,q92-v2.4,q93-v2.4,q94-v2.4,q95-v2.4,q96-v2.4,q97-v2.4,q98-v2.4,q99-v2.4,ss_max-v2.4","true"], - "sparkSubmitParameters": "--class com.amazonaws.eks.tpcds.BenchmarkSQL --conf spark.driver.cores=2 --conf spark.driver.memory=3g --conf spark.executor.cores=4 --conf spark.executor.memory=6g --conf spark.executor.instances=47"}}' \ - --configuration-overrides '{ - "applicationConfiguration": [ - { - "classification": "spark-defaults", - "properties": { - "spark.kubernetes.container.image": "'$ECR_URL'/eks-spark-benchmark:emr6.6", - "spark.kubernetes.driver.podTemplateFile": "s3://'$S3BUCKET'/app_code/pod-template/driver-pod-template.yaml", - "spark.kubernetes.executor.podTemplateFile": "s3://'$S3BUCKET'/app_code/pod-template/executor-pod-template.yaml", - - "spark.executor.memoryOverhead": "2G", - "spark.network.timeout": "2000s", - "spark.executor.heartbeatInterval": "300s", - "spark.kubernetes.executor.podNamePrefix": "emr-eks-tpcds-c54", - "spark.kubernetes.node.selector.eks.amazonaws.com/nodegroup": "C5_4", - - "spark.ui.prometheus.enabled":"true", - "spark.executor.processTreeMetrics.enabled":"true", - "spark.kubernetes.driver.annotation.prometheus.io/scrape":"true", - "spark.kubernetes.driver.annotation.prometheus.io/path":"/metrics/executors/prometheus/", - "spark.kubernetes.driver.annotation.prometheus.io/port":"4040", - "spark.kubernetes.driver.service.annotation.prometheus.io/scrape":"true", - "spark.kubernetes.driver.service.annotation.prometheus.io/path":"/metrics/driver/prometheus/", - "spark.kubernetes.driver.service.annotation.prometheus.io/port":"4040", - "spark.metrics.conf.*.sink.prometheusServlet.class":"org.apache.spark.metrics.sink.PrometheusServlet", - "spark.metrics.conf.*.sink.prometheusServlet.path":"/metrics/driver/prometheus/", - "spark.metrics.conf.master.sink.prometheusServlet.path":"/metrics/master/prometheus/", - "spark.metrics.conf.applications.sink.prometheusServlet.path":"/metrics/applications/prometheus/" - }} - ], - "monitoringConfiguration": { - "s3MonitoringConfiguration": {"logUri": "s3://'$S3BUCKET'/elasticmapreduce/emr-containers"}}}' diff --git a/provision.sh b/provision.sh index ecf9bd3..ef4e02e 100755 --- a/provision.sh +++ b/provision.sh @@ -9,7 +9,7 @@ export OSS_SPARK_SVCACCT_NAME=oss export OSS_NAMESPACE=oss export EMR_NAMESPACE=emr -export EKS_VERSION=1.21 +export EKS_VERSION=1.26 export EMRCLUSTER_NAME=emr-on-$EKSCLUSTER_NAME export ROLE_NAME=${EMRCLUSTER_NAME}-execution-role export ACCOUNTID=$(aws sts get-caller-identity --query Account --output text) @@ -193,7 +193,7 @@ autoDiscovery: clusterName: $EKSCLUSTER_NAME awsRegion: $AWS_REGION image: - tag: v1.21.1 + tag: v1.26.3 nodeSelector: app: sparktest podAnnotations: @@ -213,7 +213,8 @@ helm install nodescaler autoscaler/cluster-autoscaler --namespace kube-system -- # Install Spark-Operator for the OSS Spark test helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator -helm install -n $OSS_NAMESPACE spark-operator spark-operator/spark-operator --version 1.1.6 \ +helm repo update +helm install -n $OSS_NAMESPACE spark-operator spark-operator/spark-operator --version 1.1.27 \ --set serviceAccounts.spark.create=false --set metrics.enable=false --set webhook.enable=true --set webhook.port=443 --debug echo "============================================================================="