This is our source code for Project 2 in the course FYS-STK4155 Applied Data Analysis and Machine Learning at the University of Oslo.
The project is based on classification using logistic regression (LR) and a multilayer perceptron (MLP) on credit card data from the UCI Machine Learning Repository. Additionally, we will perform regression analysis on the two-dimensional Franke function and compare the results with a prior analysis where we used ordinary least squares.
The aim of this project is to get a deeper understanding of the two different methods and to apply them to a real data set. To achieve this, we have made our own implementation of LR and MLP in Python.
Please install dependencies using Pipenv by using the command pipenv install prior to running any scripts.
To generate/read data, train the models, print/plot the results and build the report, run main_script.sh .
-
src/main.py: Main script containing all regression methods
-
src/test_main.py: Unit tests for src/main.py .
-
src/read_credit_data: Reads, preprocesses and exports the credit card data as .npz files.
-
src/generate_franke.py: Generates, preprocesses and exports the Franke function data as .npz files.
-
src/train_*.py: Trains and exports the models
-
src/plot_*.py: Plotting script for the logistic regression model, MLP classification model and MLP regression model.
-
src/data: Folder where all the data from src/generate_franke.py and src/read_credit_data.py is saved.
-
src/models: Folder where the models for each data set is saved.
-
src/cv_results: Folder where all the optimalized hyperparameters found by cross-validation in src/train_*.py are saved.
-
doc/report_2.tex, doc/references.bib: .tex file for the project report and references
-
doc/report_2.pdf: The built report as a .pdf.
-
doc/figures: Folder where all the figures from src/plot_*.py are saved.
We have used Black for proper code formatting in Python.