-
Notifications
You must be signed in to change notification settings - Fork 10
/
README
66 lines (49 loc) · 2.1 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
more info: http://sharnett.github.com/netflix/
TO RUN C++ VERSION:
make a new directory called "cpp" in the netflix data folder (need ~1.5 GB space)
edit the file "data_folder.txt" to point to the netflix data folder
make all
./create_binaries
./probe_bs
./funk -i
uses alglib (source already included) http://www.alglib.net/
logs crap to a sqlite3 database, this is optional
on Linux, needs BLAS
--------------------------------------------------------------------------------
a ton of files, what are they?
* alglib
Code for L-BFGS and non-linear conjugate gradient
* best_avgs
* train_and_dump_avgs
Solves least squares problem to find an offset for each movie and user. The
base prediction for movie i and user j is then
\p_ij = \mu + \mu_i + \mu_j
Where \mu is the overall average rating, \mu_i is the offset for movie i,
and \mu_j is the offset for user j. It cross-validates against the probe
set to find the best choices that prevent over-fitting. This is explained
a bit more in the "Winners' paper" link.
best_avgs finds the optimal regularization parameter. The other code solves
with this parameter and write the result to disk.
* best_lambda
Finds best regularization parameter for a given number of features.
* create_binaries
Reads in the Netflix Prize data from the text files and converts it to more
convenient binary format for fast loading.
* find_average
Finds overall average rating, sum(ratings)/len(ratings)
* fmin
Some freely available code for minimizing a 1D function over an interval
without using derivatives. It's the same algorithm as MATLAB's fminbnd. This
gets used when tuning meta-parameters during cross-validation.
* funk
Main program
* load
Helper functions to read various data from disk.
* optimizers
Code for L-BFGS, CG, SGD, GD
* predictor
Class that predicts ratings. Maintains a user and movie matrix, the product
of which is the predictions matrix.
* probe_bs
Removes ratings in the probe set from the data to get a pure training set.
The probe set is then used for cross-validation.