GitHub - Mashghdoust/Evaluating-ML-Models-Using-Data-Generated-by-GAN: In this project we have preproccesed a dataset and generated synthetic data using GAN. Then we have compared the accuracy of different ML models on the generated data.

Mashghdoust / Evaluating-ML-Models-Using-Data-Generated-by-GAN Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

In this project we have preproccesed a dataset and generated synthetic data using GAN. Then we have compared the accuracy of different ML models on the generated data.

0 stars 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ADM_FinalReport.pdf		ADM_FinalReport.pdf
Evaluating Different ML models.ipynb		Evaluating Different ML models.ipynb
GAN.ipynb		GAN.ipynb
README.txt		README.txt
Syn_data.csv		Syn_data.csv
Table.xls		Table.xls
airline_test_BlanksRemoved.xlsx		airline_test_BlanksRemoved.xlsx
airline_train_BlanksRemoved.xlsx		airline_train_BlanksRemoved.xlsx
test.csv		test.csv
train.csv		train.csv

Repository files navigation

The models were trained on a AMD Ryzen 7 1700 Eight-Core Processor. The codes are written in python using Jupyter Notebook.
==================================================================================
Data
*****
Raw Data downloaded from kaggle (https://www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction):
train.xlsx
test.xlsx

Some simple preprocessings were made manually in excel:
airline_train_BlanksRemoved.xlsx 
airline_test_BlanksRemoved.xlsx  

Syn_data.xlsx is the data generated by our code (explained in the next part)
==================================================================================
The project contains 2 main codes.
1- GAN: Produce the GAN network and the syntethic data.
2- ML: Testing different data (syntethic, real, combined) on different ML models.
-----------------------------------------------------------------------------------
GAN
*****
a) Run Libraries section

b) Run Importing Data section: change the dirrectory to r'YOUR-ADRESS/FILE-NAME.xlsx'. 
	df is the train data: airline_train_BlanksRemoved.xlsx
	in the last line we have computed the norm of the data, we will use this matrix norm in the "Evaluating Different ML models" part
	you can see the matrix norms (we use it in the next code)

b) Run Functions section: you can change the NN parameters to see the differences they can make.

d) Run Training & Producing fake data for discriminator: you can change the number of fake data produced to train the discriminator by the variable n. and finally you can see the accuracy.
	Note: by runing the whole section once you have a discriminator model. For a valid evaluation on the fake data, produce a new set of fake data by running the second subsection again (ignore the 1st and the 3rd subsections) and run the last subsection to see the accuracy on fake and real data.

e)Run Training GAN section:
	1: a summary of the model that have not been trained yet
	2: training the gan
	3:a summary of the trained model 

f)Run Generating Data section: 
	1:by changing the the value 10001 to n, you generate n synthetic data.
	2: it exports the synthetic data to your computer (where you run the code)
-----------------------------------------------------------------------------------
Evaluating Different ML models 
*****
a) Run Libraries section

b) Run Functions section: define a global variable A manually as the norm of your train data features (obtained in GAN part)

c) Run Importing Data section: change the 3 dirrectories to r'YOUR-ADRESS/FILE-NAME.xlsx'. 
	df1 is the test data: airline_test_BlanksRemoved.xlsx
	df2 is the original train data: airline_train_BlanksRemoved.xlsx
	df3 is the synthetic train data: Syn_data_good.csv

d) Run Training and Evaluating section: You can see the different models used in subsections:
	SVM : RBF,Linear,sigmoid: each model were trained and tested on combined, syntethic in each subsection and the accuracy is reported.

	KNN : k=3,k=5,k=7: each model were trained and tested on combined, syntethic in each subsection and the accuracy is reported. you can simply change the value n_neighbors in order to change k.


	Neural Networks: model1,model2: each model were trained and tested on combined, syntethic in each subsection and the accuracy is reported. You can change the S_NN1 layers for different NNs to see what differences it can make.
===================================================================================
Table.xls is the model accuracies obtained by our evaluation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Mashghdoust/Evaluating-ML-Models-Using-Data-Generated-by-GAN

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages