Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Tachyon WriteType configurable. #327

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

GraceH
Copy link

@GraceH GraceH commented Apr 22, 2014

Tachyon supports kinds of WriteType, like CACHE_THROUGH, MUST_CACHE, TRY_CACHE, and etc. And currently Shark only supports CACHE_THROUGH. Here we make the write type of TachyonOffheapTableWriter configurable, so that the end user can choose different types by "set shark.tachyon.writetype=xxx", to avoid some WRITETHROUGH overhead somehow.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@haoyuan
Copy link
Member

haoyuan commented Apr 22, 2014

Jenkins, test this please.

1 similar comment
@haoyuan
Copy link
Member

haoyuan commented Apr 22, 2014

Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/Shark-Pull-Request-Builder/12191/

@aarondav
Copy link
Contributor

Writing to Tachyon with a WriteType without "THROUGH" could be very unstable. In Shark, for instance, cached tables are persisted in Spark with MEMORY_AND_DISK, so that when the data is evicted from memory it's stored on disk. Tachyon, however, does not support this sort of semantics, so if you cache a table with TRY_CACHE and later have to evict some part of it, it's gone forever and future Shark queries will just throw exceptions.

@GraceH
Copy link
Author

GraceH commented Apr 23, 2014

Yes. You are right. The stability is a big concern here. That is why we leave CACHE_THROUGH as the default write type. And meanwhile, we can let the user to trade-off between performance and stability by themselves. Just like options of MEM_ONLY and MEM_AND_DISK, I wonder if it is possible to offer the different configuration items for more flexible usage. And on the other hand, we can improve the stability inside the Tachyon.

@@ -56,6 +56,8 @@ object SharkConfVars {

// Number of mappers to force for table scan jobs
val NUM_MAPPERS = new ConfVar("shark.map.tasks", -1)

val TACHYON_WRITER_WRITETYPE = new ConfVar("shark.tachyon.writetype", "CACHE_THROUGH")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here to both explain what this option is and to warn against the dangers of setting the WriteType without THROUGH?

@aarondav
Copy link
Contributor

Alright, sounds fine to me, as long as the use case you had in mind can tolerate data randomly falling out of Tachyon. This change looks good to me, though it'd be great if you could just add the comment I mentioned to warn people about this problem.

@GraceH
Copy link
Author

GraceH commented Jun 24, 2014

@aarondav Is there anything else blocking it?

@aarondav
Copy link
Contributor

Uh, looks good to me but I'm not a committer on this project :) @rxin, care to merge this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants