Best Practices Avoid GroupByKey Don't copy all elements of a large RDD to the driver Gracefully Dealing with Bad Input Data