-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Tachyon WriteType configurable. #327
base: master
Are you sure you want to change the base?
Conversation
Can one of the admins verify this patch? |
Jenkins, test this please. |
1 similar comment
Jenkins, test this please. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Writing to Tachyon with a WriteType without "THROUGH" could be very unstable. In Shark, for instance, cached tables are persisted in Spark with MEMORY_AND_DISK, so that when the data is evicted from memory it's stored on disk. Tachyon, however, does not support this sort of semantics, so if you cache a table with TRY_CACHE and later have to evict some part of it, it's gone forever and future Shark queries will just throw exceptions. |
Yes. You are right. The stability is a big concern here. That is why we leave CACHE_THROUGH as the default write type. And meanwhile, we can let the user to trade-off between performance and stability by themselves. Just like options of MEM_ONLY and MEM_AND_DISK, I wonder if it is possible to offer the different configuration items for more flexible usage. And on the other hand, we can improve the stability inside the Tachyon. |
@@ -56,6 +56,8 @@ object SharkConfVars { | |||
|
|||
// Number of mappers to force for table scan jobs | |||
val NUM_MAPPERS = new ConfVar("shark.map.tasks", -1) | |||
|
|||
val TACHYON_WRITER_WRITETYPE = new ConfVar("shark.tachyon.writetype", "CACHE_THROUGH") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment here to both explain what this option is and to warn against the dangers of setting the WriteType without THROUGH?
Alright, sounds fine to me, as long as the use case you had in mind can tolerate data randomly falling out of Tachyon. This change looks good to me, though it'd be great if you could just add the comment I mentioned to warn people about this problem. |
@aarondav Is there anything else blocking it? |
Uh, looks good to me but I'm not a committer on this project :) @rxin, care to merge this? |
Tachyon supports kinds of WriteType, like CACHE_THROUGH, MUST_CACHE, TRY_CACHE, and etc. And currently Shark only supports CACHE_THROUGH. Here we make the write type of TachyonOffheapTableWriter configurable, so that the end user can choose different types by "set shark.tachyon.writetype=xxx", to avoid some WRITETHROUGH overhead somehow.