Simulating Credit Card Fraud Detection in Real Time using Machine Learning with Highly Imbalanced Data
In this Repository we will deal with the following:
- How is Over-sampling useful in dealing with highly imbalanced datasets?
- How Random Forest, among the range of evaluated machine learning algorithms, could predict frauds with a F1-Score of 81 percent?
- Integrating Big Data tools like KAFKA, SPARK and DBFS with Random Forest to simulate a real time credit card fraud detection system.
- Testing this system over high volume and velocity and measure performance.