-
Notifications
You must be signed in to change notification settings - Fork 5
/
README.Rmd
176 lines (143 loc) · 7.17 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
---
output: github_document
bibliography: ["readme.bib"]
---
```{r rmd-setup, include = FALSE}
knitr::opts_chunk$set(fig.path = "man/figures/")
```
# PlackettLuce
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/PlackettLuce)](https://cran.r-project.org/package=PlackettLuce)
[![R-CMD-check](https://github.com/hturner/PlackettLuce/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/hturner/PlackettLuce/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/hturner/PlackettLuce/branch/main/graph/badge.svg)](https://app.codecov.io/gh/hturner/PlackettLuce?branch=main)
Package website: https://hturner.github.io/PlackettLuce/.
## Overview
The **PlackettLuce** package implements a generalization of the model jointly
attributed to @Plackett1975 and @Luce1959 for modelling rankings data.
Examples of rankings data might be the finishing order of competitors in a race,
or the preference of consumers over a set of competing products.
The output of the model is an estimated **worth** for each item that appears in
the rankings. The parameters are generally presented on the log scale for
inference.
The implementation of the Plackett-Luce model in **PlackettLuce**:
- Accommodates ties (of any order) in the rankings, e.g.
bananas $\succ$ {apples, oranges} $\succ$ pears.
- Accommodates sub-rankings, e.g. pears $\succ$ apples, when the full set of items is {apples, bananas, oranges, pears}.
- Handles disconnected or weakly connected networks implied by the rankings, e.g. where one item always loses as in figure below. This is achieved by adding pseudo-rankings with a
hypothetical or ghost item.
```{r always-loses, message = FALSE, echo = FALSE, fig.width = 3.5, fig.height = 3.5}
library(PlackettLuce)
library(igraph)
R <- matrix(c(1, 2, 0, 0,
2, 0, 1, 0,
1, 0, 0, 2,
2, 1, 0, 0,
0, 1, 2, 0), byrow = TRUE, ncol = 4,
dimnames = list(NULL, LETTERS[1:4]))
R <- as.rankings(R)
A <- adjacency(R)
net <- graph_from_adjacency_matrix(A)
plot(net, edge.arrow.size = 0.5, vertex.size = 30)
```
</br>
In addition the package provides methods for
- Obtaining quasi-standard errors, that don't depend on the constraints applied
to the worth parameters for identifiability.
- Fitting Plackett-Luce trees, i.e. a tree that partitions the rankings by
covariate values, such as consumer attributes or racing conditions, identifying
subgroups with different sets of worth parameters for the items.
## Installation
The package may be installed from CRAN via
```{r, eval = FALSE}
install.packages("PlackettLuce")
```
The development version can be installed via
```{r, eval = FALSE}
# install.packages("devtools")
devtools::install_github("hturner/PlackettLuce")
```
## Usage
The [Netflix Prize](https://en.wikipedia.org/wiki/Netflix_Prize) was a competition devised by
Netflix to improve the accuracy of its recommendation system. To facilitate
this they released ratings about movies from the users of the system that have
been transformed to preference data and are available from
[PrefLib](https://www.preflib.org/dataset/00004), [@Bennett2007]. Each data set
comprises rankings of a set of 3 or 4 movies selected at random. Here we
consider rankings for just one set of movies to illustrate the functionality of
**PlackettLuce**.
The data can be read in using the `read.soc` function in **PlackettLuce**
```{r}
library(PlackettLuce)
preflib <- "https://www.preflib.org/static/data/"
netflix <- read.soc(file.path(preflib, "netflix/00004-00000138.soc"))
head(netflix, 2)
```
Each row corresponds to a unique ordering of the four movies in this data set.
The number of Netflix users that assigned that ordering is given in the first
column, followed by the four movies in preference order. So for example, 68
users ranked movie 2 first, followed by movie 1, then movie 4 and finally movie
3.
`PlackettLuce`, the model-fitting function in **PlackettLuce** requires that
the data are provided in the form of *rankings* rather than *orderings*, i.e.
the rankings are expressed by giving the rank for each item, rather than
ordering the items. We can create a `"rankings"` object from a set of orderings
as follows
```{r}
R <- as.rankings(netflix[,-1], input = "orderings",
items = attr(netflix, "items"))
R[1:3, as.rankings = FALSE]
```
Note that `read.soc` saved the names of the movies in the `"items"` attribute of
`netflix`, so we have used these to label the items. Subsetting the
rankings object `R` with `as.rankings = FALSE`, returns the underlying matrix of
rankings corresponding to the subset. So for example, in the first ranking the
second movie (Beverly Hills Cop) is ranked number 1, followed by the first movie
(Mean Girls) with rank 2, followed by the fourth movie (Mission: Impossible II)
and finally the third movie (The Mummy Returns), giving the same ordering as in
the original data.
Various methods are provided for `"rankings"` objects, in particular if we
subset the rankings without `as.rankings = FALSE`, the result is again a
`"rankings"` object and the corresponding print method is used:
```{r}
R[1:3]
print(R[1:3], width = 60)
```
The rankings can now be passed to `PlackettLuce` to fit the Plackett-Luce model.
The counts of each ranking provided in the downloaded data are used as weights
when fitting the model.
```{r}
mod <- PlackettLuce(R, weights = netflix$Freq)
coef(mod, log = FALSE)
```
Calling `coef` with `log = FALSE` gives the worth parameters, constrained to
sum to one. These parameters represent the probability that each movie is ranked
first.
For inference these parameters are converted to the log scale, by default
setting the first parameter to zero so that the standard errors are estimable:
```{r}
summary(mod)
```
In this way, Mean Girls is treated as the reference movie, the positive
parameter for Beverly Hills Cop shows this was more popular among the users,
while the negative parameters for the other two movies show these were less
popular.
Comparisons between different pairs of movies can be made visually by plotting
the log-worth parameters with comparison intervals based on quasi standard
errors.
```{r qv, fig.width = 9}
qv <- qvcalc(mod)
plot(qv, ylab = "Worth (log)", main = NULL)
```
If the intervals overlap there is no significant difference. So we can see that
Beverly Hills Cop is significantly more popular than the other three movies,
Mean Girls is significant more popular than The Mummy Returns or
Mission: Impossible II, but there was no significant difference in users'
preference for these last two movies.
## Going Further
The core functionality of **PlackettLuce** is illustrated in the package
vignette, along with details of the model used in the package and a comparison
to other packages. The vignette can be found on the [package website](https://hturner.github.io/PlackettLuce/) or from within R once the
package has been installed, e.g. via
vignette("Overview", package = "PlackettLuce")
## Code of Conduct
Please note that this project is released with a [Contributor Code of Conduct](https://github.com/hturner/PlackettLuce/blob/master/CONDUCT.md). By participating in this project you agree to abide by its terms.
## References