-
Notifications
You must be signed in to change notification settings - Fork 1
Test for Collaborative Filtering
Mateus de Assis Silva edited this page May 31, 2020
·
3 revisions
For an example, let's suppose the following data:
- There are 6 products (R1,R2,R3,R4,R5 and R6);
- There are 3 "training set" users (Bob, Chris and Diana);
- The query instance is Alice's ratings on the 5.
User/Product | R1 | R2 | R3 | R4 | R5 | R6 |
---|---|---|---|---|---|---|
Alice | 2 | - | 4 | 4 | - | 5 |
Bob | 1 | 5 | 4 | - | 3 | 4 |
Chris | 5 | 2 | - | 2 | 1 | - |
Diana | 3 | - | 2 | 2 | - | 4 |
But the data does not come in this table format. Actually, let's imagine a log file where each row are: movieID, customerID,rating. For simplicity, let's suppose Alice's ID 1, Bob's ID 2, Chris' ID 3 and Diana's 4.
Our training data, then, come in the following format. Notice since Alice is the query instance, she does not appear in the training data.
movieID | customerID | rating |
---|---|---|
1 | 2 | 1 |
2 | 2 | 5 |
3 | 2 | 4 |
5 | 2 | 3 |
6 | 2 | 4 |
1 | 3 | 5 |
2 | 3 | 2 |
4 | 3 | 2 |
5 | 3 | 1 |
1 | 4 | 3 |
3 | 4 | 2 |
4 | 4 | 2 |
6 | 4 | 4 |
Query instance (Alice's logs) is:
movieID | customerID | rating |
---|---|---|
1 | 1 | 2 |
3 | 1 | 4 |
4 | 1 | 4 |
6 | 1 | 5 |
What we want to know (to predict) is Alice's rate on movie 5. We can easily estimate this value using mtxslv_collab_filter()
. Let's define training and testing data:
training_set_example = np.array([[1,2,1],[2,2,5],[3,2,4],[5,2,3],
[6,2,4],[1,3,5],[2,3,2],[4,3,2],
[5,3,1],[1,4,3],[3,4,2],[4,4,2],
[6,4,4]])
testing_instance = np.array([[1,1,2],[3,1,4],[4,1,4],[6,1,5]])
Now apply the function!
estimativa = mtxslv_collab_filter(testing_instance,5,training_set_example)
The manually calculated value is 4.336075363.
Test it yourself, estimativa
is 4.336096188777122.