- Loading the data (in this case from local file system not hadoop)
- Feature extraction (here we just scale all metrics to [0,1] range)
- Train the classifier (we train a logistic regression classifier)
- Load more data
- Add better signal processing methods
- Improve feature extraction process
- Add many other classifiers such as mentioned here: https://spark.apache.org/docs/latest/ml-classification-regression.html
- Present better the metrics of a model (accuracy, RoC …)
- Management of classifiers ie that you can load them from files
- Design easy application management with arguments such as input file location, parameters which signal processing methods to use or which classifier to use, where to save results, what metrics to track…
Probably the easiest way to load this application is just to clone or check it out from Github. It will run even without Apache Spark or Hadoop installed. Using IntellijIdea it's really easy to set-it up, I don't know for Eclipse.