-
Notifications
You must be signed in to change notification settings - Fork 13
/
Copy pathREADME.txt
107 lines (74 loc) · 3.59 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
Code for paper "Accuracy First: Selecting a Differentially Private
Level for Accuracy-Constrained ERM"
by Ligett, Neel, Roth, Waggoner, Wu
https://arxiv.org/abs/1705.10829
---------------------------------------------------------------
-- Disclaimer
This code is used for simulations of the performance of
differentially private algorithms, but should not be used in practice
to protect actual sensitive data! The theorems are proved for true
randomness and real numbers, while the code uses python's internal
random generators and floating-point numbers.
---------------------------------------------------------------
-- Requirements
1. python3 with the numpy, matplotlib, and scikit-learn libraries.
2. (optional) Linux, bash, and the GNU parallel utility
available from most repositories.
This is not mandatory, it's just that a bash script is
used for running a whole bunch of experiments at once
and in parallel.
---------------------------------------------------------------
-- Usage
Navigate into the code/ directory to run the code.
You will need a dataset file in plain text.
Each row of the file is a data point.
It should contain d+1 space-separated numbers
(for some d) where the first d are "x" and the last is "y".
It is assumed that the L1-norm of each x is at most 1,
and each |y| <= 1.
For logistic regression, each y should be plus or minus 1.
See data/ directories for downloading the datasets used in the
paper and processing them into this format.
You can run a single experiment at a time and print the output,
or run a set of experiments and save the outputs into folders.
---------------------------------------------------------------
-- To run a single experiment for a given data set and parameters:
$ python3 run_ridge.py [args]
OR
$ python3 run_logist.py [args]
Run them with no arguments for help on the args.
---------------------------------------------------------------
-- To run a set of experiments:
1. You should have a dataset file and also a file with a list of
the alpha parameters to try, called alphalist.txt.
E.g. you can edit 'gen_alphalist.py' to your liking and then run
$ gen_alphalist.py > alphalist.txt
2. Edit the file 'run_sims.sh' to set all the parameters to your
liking. Also edit the top of the file 'prep_simulations.sh'
to rename the variable 'run_file_name'.
It should be "run_many_logist.py" if you want logistic regression
or "run_many_ridge.py" if you want ridge regression.
3. Execute the following (full explanation of what it does below):
$ ./run_sims.sh
4. Execute the following to read the results, print some output about
them, and produce some plots.
$ python3 collect_results_ridge.py sims-results/
----------------
About run_sims.sh:
This will create a folder sims-results/ and run a bunch of simulations
writing the results into that folder, along with an 'about.py'
file that specifies what all the parameters were.
It does the following:
a. Runs python3 prep_simulations.py [args]
which creates the folder sim-results/
and writes about.py into it.
Also writes a list of commands to a temporary file.
b. Invokes GNU parallel to run the commands in parallel.
WARNING: for large datasets, may use up all your RAM and
crash your computer! Use --max-procs to limit the number
of commands to run simultaneously.
Each command that is run is of form in step c.
c. python3 run_many_ridge.py [args]
Runs num_trials experiments for the given parameters,
writing the outputs into sim-results/param-i/ for the
given i.