Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YARN build fixes #892

Merged
merged 3 commits into from
Sep 7, 2013
Merged

YARN build fixes #892

merged 3 commits into from
Sep 7, 2013

Conversation

jey
Copy link
Contributor

@jey jey commented Sep 4, 2013

This PR updates the YARN build docs and fixes the build under Maven with Hadoop 0.23.x

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

Unfortunately, the automated tests for this request have failed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/830/

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

All automated tests for this request have passed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/831/

@mateiz
Copy link
Member

mateiz commented Sep 4, 2013

Jey, the goal of my change was for the "assembly" project to include all the code you'd want to run on Spark. This is why I removed an assembly form "yarn". Notice that assemblyProj depends on YARN if active.

I think the problem is just that the running on YARN doc is wrong.. fix that to use the assembly.jar in assembly/.

@mateiz
Copy link
Member

mateiz commented Sep 4, 2013

Also, example should depend on maybeYarn in case you want to run them on YARN.

@jey
Copy link
Contributor Author

jey commented Sep 4, 2013

Applications do not need to depend on spark-yarn. They only need to specify spark-core as their dependency.

@mateiz
Copy link
Member

mateiz commented Sep 4, 2013

Ah, okay, that makes sense. But for the assembly definitely just use the one in assembly/ instead. Part of the reason I did this is that I wanted to build fewer assemblies (only examples and assemblyProj).

@jey
Copy link
Contributor Author

jey commented Sep 4, 2013

Makes sense. I'm updating my patch accordingly

@jey
Copy link
Contributor Author

jey commented Sep 4, 2013

How do we want YARN users to build/run their apps? How about if we say that they must always create a single assembly that contains spark-yarn and their own code, and modify spark.deploy.yarn.Client to combine the SPARK_JAR envvar and the --jar argument?

Then an example invocation would look like this:

./spark-class org.apache.spark.deploy.yarn.Client \
    --jar examples/target/scala-2.9.3/spark-examples-assembly-0.8.0-SNAPSHOT.jar \
    --class org.apache.spark.examples.SparkPi \
    --args yarn-standalone \
    --num-workers 3 \
    --master-memory 4g \
    --worker-memory 2g \
    --worker-cores 1

@mateiz
Copy link
Member

mateiz commented Sep 5, 2013

I think we should ask some of the YARN people about that. One advantage of separating the Spark JAR is that it might be possible to put it in a standard path on HDFS and then YARN will cache it locally on each worker node, avoiding a long download. Then the user's JAR can be compiled with spark listed as "provided" and will contain only the user's own classes and other dependencies that aren't in the YARN JAR.

@mridulm, @tgravescs any thoughts on this?

@fxc123
Copy link

fxc123 commented Sep 5, 2013

@mateiz I have tried use spark-assembly jar, it seems org.apache.spark.deploy.yarn.Client is not built in it?

@mateiz
Copy link
Member

mateiz commented Sep 5, 2013

@fxc123 did you build with the environment variable SPARK_YARN set to "true"? You need to do

SPARK_HADOOP_VERSION=whatever SPARK_YARN=true sbt/sbt assembly

It might also help to do sbt clean before.

@tgravescs
Copy link

I do not like the idea of people having to include their own code with the spark assembly jar. That makes it impossible to deploy just a spark jar that multiple people can share and as Matei said it can be cached on the hadoop nodes and not have to be downloaded everytime.

it would be nice to decide what we are doing with the assemblies though. The last time used mvn and tried to use the core assembly it didn't have the YarnClientImpl that was needed to run on yarn. So I used the yarn jar. But I don't think it included the repl stuff.

@mateiz
Copy link
Member

mateiz commented Sep 5, 2013

@tgravescs the assembly built into the assembly/ directory now should have both YARN, the REPL, and all the user libraries in Spark. Try that one out.

@jey
Copy link
Contributor Author

jey commented Sep 5, 2013

Does repl (i.e. spark-shell) work on YARN? How should I run it?

@mateiz
Copy link
Member

mateiz commented Sep 5, 2013

There's a patch for it (#868), so although we probably won't merge that patch in 0.8.0, let's include it. We will merge it in 0.8.1.

@mateiz
Copy link
Member

mateiz commented Sep 5, 2013

(More generally, I also wanted to build as few assembly JARs as possible because it's slow. One "runtime environment" one and one "examples" one is the least we could do.)

@tgravescs
Copy link

@mateiz , I just tried using the assembly (assembly/target/scala-2.9.3/spark-assembly-0.8.0-SNAPSHOT-hadoop0.23.7.jar) with run-example (built using mvn -Phadoop2-yarn package) but it fails to run because its missing org/apache/hadoop/yarn/client/YarnClientImpl. This exists in the yarn spark-yarn-0.8.0-SNAPSHOT-shaded.jar.

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

All automated tests for this request have passed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/848/

@jey
Copy link
Contributor Author

jey commented Sep 6, 2013

This PR has been significantly updated, now it updates the YARN build docs and fixes the build under Maven with Hadoop 0.23.x

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

All automated tests for this request have passed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/849/

mateiz added a commit that referenced this pull request Sep 7, 2013
@mateiz mateiz merged commit afe46ba into mesos:master Sep 7, 2013
@mateiz
Copy link
Member

mateiz commented Sep 7, 2013

Looks good, thanks!

@jey jey deleted the fix-yarn-assembly branch September 7, 2013 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants