Protection of data semantics when training neural networks in clouds

Data-Semantics-Protector ·

This repository contains the code of a project designed to enable the training of artificial neural networks in the clouds without the need to expose the semantics of the data used for training.

The idea behind this project is as follows:

There is a real data set on which it is necessary to train NN. Suppose in this dataset there are 2 classes of images "0" and "1". Our task is to get a neural network trained to solve a problem related to these data, however, the owner of the remote computing equipment should not know on which image classes the training takes place.
It is proposed to train locally NN-GAN, which will transform one class of images into another (which we can already disclose to the owner of the remote computing equipment). In this implementation, GAN will translate "0"->"2", "1"->"3". "0", "1" are the classes we want to hide, "2", "3" are the ones we can reveal.
Next, convert all data from the original dataset using the trained GAN network (to get a "false" dataset from "2" and "3").
Next, train the model (in this project it is a classifier model) in a cloud computing service (for example, Google Colab) on a "false" dataset. This model will answer the question "does this image belong to "2" or "3"?"
Then, we can organize a pipeline of NNs, allowing the use of a locally trained GAN ("0"->"2", "1"->"3") and a remotely trained classifier network "2" and "3" to solve the problem of classifying the INITIAL classes "0" and "1". The idea is shown in the figure below:

Note: it makes sense to use such a pipeline only when a neural network trained in the cloud is more complicated than a neural network trained on a local machine. In this project, this is not the case, because it is much more difficult to train the generator than the image classifier, however, for ease of explaining the idea of the project, these networks were chosen (only for demonstration purposes)

💻 Getting Started

In Google Colab it is necessary, according to the numbering of directories, to run the source code contained in them in the Python programming language. In this case, it is necessary to create directories with appropriate names, etc. (only in a few places, see related code!). The training data is presented in the Datasets directory.

📑 Licence

Data-Semantics-Protector is CC BY-NC-SA 3.0 licensed.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
1_Generator_training		1_Generator_training
2_Creating_false_image_set		2_Creating_false_image_set
3_preparing_dataset_from_false_files		3_preparing_dataset_from_false_files
4_training_a_classifier_on_a_dataset_of_false_data		4_training_a_classifier_on_a_dataset_of_false_data
5_Assessment_of_classifier_on_real_2_and_3		5_Assessment_of_classifier_on_real_2_and_3
6_Checking_the_ready_system		6_Checking_the_ready_system
Datasets		Datasets
resources		resources
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Protection of data semantics when training neural networks in clouds

Data-Semantics-Protector ·

💻 Getting Started

📑 Licence

About

Releases

Packages

Languages

License

SergeyIvanovDevelop/Data-Semantics-Protector

Folders and files

Latest commit

History

Repository files navigation

Protection of data semantics when training neural networks in clouds

Data-Semantics-Protector ·

💻 Getting Started

📑 Licence

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages