-
Notifications
You must be signed in to change notification settings - Fork 385
Spark Debugger
From a user's point of view, debugging a general distributed program can be tedious and confusing. Many distributed programs are nondeterministic; their outcome depends on the interleaving between computation and message passing across multiple machines. Also, the fact that a program is running on a cluster of hundreds or thousands of machines means that it's hard to understand the program state and pinpoint the location of problems.
In order to tame nondeterminism, a distributed debugger has to log a lot of information, imposing a serious performance penalty on the application being debugged.
But the Spark programming model lets us provide replay debugging for almost zero overhead. Spark programs are a series of RDDs and deterministic transformations, so when debugging a Spark program, we don't have to debug it all at once -- instead, we can debug each transformation individually. Broadly, the debugger lets us do the following two things:
- Recompute and inspect intermediate RDDs after the program has finished.
- Re-run a particular task in a single-threaded debugger to find exactly what went wrong.
At least in theory, debugging a Spark program is now as easy as debugging a single-threaded one.