-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compression codec change #754
Conversation
…ession Conflicts: project/SparkBuild.scala
Reynold, thanks very much for this. Sorry, I've been working on deploying our first release on production, haven't got time to handle this. So thanks for the help, appreciate it! |
…nt with the documentation.
Thank you for submitting this pull request. All automated tests for this request have passed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/374/ |
Hey Reynold, one question: is there a native Snappy library this depends on? If so, where will it look for it? |
Yes. See http://xerial.org/snappy-java/ Portable across various operating systems; Snappy-java contains native libraries built for Window/Mac/Linux (32/64-bit). At runtime, snappy-java loads one of these libraries according to your machine environment (It looks system properties, os.name and os.arch). |
Yes, the jar published in Maven repository already contains native libraries for Win/Mac/Linux(32/64). While for users of other OS/CPU architecture, they may need to build the native libraries from source code as introduced in http://xerial.org/snappy-java/. |
Ah okay, that's great. |
*/ | ||
trait CompressionCodec { | ||
|
||
def compressionOutputStream(s: OutputStream): OutputStream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't mind, rename these to compressedOutputStream and compressedInputStream -- just seems like a better naming convention for streams.
Hey Reynold, FYI, there are two other places where we should use the codec instead of LZF:
Basically do a search through the codebase for LZF and make sure we're replacing it. |
…sedOutputStream and compressedInputStream.
BTW apart from this the patch looks good to me. |
Thank you for submitting this pull request. All automated tests for this request have passed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/378/ |
changed according to feedback. ptal. |
Thank you for submitting this pull request. All automated tests for this request have passed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/384/ |
@@ -49,6 +50,11 @@ class Checkpoint(@transient ssc: StreamingContext, val checkpointTime: Time) | |||
} | |||
} | |||
|
|||
private[streaming] | |||
object Checkpoint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using singleton objects is kind of ugly for testing purposes; could we just create a new codec on each checkpoint? It doesn't seem like that big a deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then we introduce the possibility that the writer could be using one codec, and the reader uses another one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't really get it. The reader is going to be a different instance of the program anyway, after a failure. It will have to be configured with the same codec class either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a header to the output that specifies the codec used? Seems
brittle to require the configuration to be correct for simply reading the
checkpoint. (Not that it's an unreasonable assumption in >90% of cases, but
robustness is nice.)
On Wed, Jul 31, 2013 at 8:52 AM, Matei Zaharia notifications@github.comwrote:
In streaming/src/main/scala/spark/streaming/Checkpoint.scala:
@@ -49,6 +50,11 @@ class Checkpoint(@transient ssc: StreamingContext, val checkpointTime: Time)
}
}+private[streaming]
+object Checkpoint {Hmm, I don't really get it. The reader is going to be a different instance
of the program anyway, after a failure. It will have to be configured with
the same codec class either way.—
Reply to this email directly or view it on GitHubhttps://github.com//pull/754/files#r5507181
.
Ok I removed the singleton and squashed the last two commits. Jenkins, test this please. |
Thank you for submitting this pull request. All automated tests for this request have passed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/400/ |
…addendum Author: witgo <witgo@qq.com> Closes mesos#754 from witgo/commons-lang and squashes the following commits: 3ebab31 [witgo] merge master f3b8fa2 [witgo] merge master 2083fae [witgo] repeat definition 5599cdb [witgo] multiple version of sbt dependency c1b66a1 [witgo] fix different versions of commons-lang dependency
This is based on and subsumes @lyogavin's pull request #685.