Skip to content

Test for Collaborative Filtering

Mateus de Assis Silva edited this page May 31, 2020 · 3 revisions

For an example, let's suppose the following data:

  • There are 6 products (R1,R2,R3,R4,R5 and R6);
  • There are 3 "training set" users (Bob, Chris and Diana);
  • The query instance is Alice's ratings on the 5.
User/Product R1 R2 R3 R4 R5 R6
Alice 2 - 4 4 - 5
Bob 1 5 4 - 3 4
Chris 5 2 - 2 1 -
Diana 3 - 2 2 - 4

But the data does not come in this table format. Actually, let's imagine a log file where each row are: movieID, customerID,rating. For simplicity, let's suppose Alice's ID 1, Bob's ID 2, Chris' ID 3 and Diana's 4.

Our training data, then, come in the following format. Notice since Alice is the query instance, she does not appear in the training data.

movieID customerID rating
1 2 1
2 2 5
3 2 4
5 2 3
6 2 4
1 3 5
2 3 2
4 3 2
5 3 1
1 4 3
3 4 2
4 4 2
6 4 4

Query instance (Alice's logs) is:

movieID customerID rating
1 1 2
3 1 4
4 1 4
6 1 5

What we want to know (to predict) is Alice's rate on movie 5. We can easily estimate this value using mtxslv_collab_filter(). Let's define training and testing data:

training_set_example = np.array([[1,2,1],[2,2,5],[3,2,4],[5,2,3],
                         [6,2,4],[1,3,5],[2,3,2],[4,3,2],
                         [5,3,1],[1,4,3],[3,4,2],[4,4,2],
                         [6,4,4]])

testing_instance = np.array([[1,1,2],[3,1,4],[4,1,4],[6,1,5]])

Now apply the function!

estimativa = mtxslv_collab_filter(testing_instance,5,training_set_example)

The manually calculated value is 4.336075363. Test it yourself, estimativa is 4.336096188777122.