Dealing with weak networks #50

kauedesousa · 2021-04-07T12:09:56Z

Dear Heather,

Here comes an issue that may be related to issue #25. But now I think we have a better clue on where is the problem, which arrises mostly when we are performing cross-validations and pltree() is exposed to a set of data with a weak network.

Here is an example

library("PlackettLuce")
source("https://raw.githubusercontent.com/AgrDataSci/ClimMob-analysis/master/R/functions.R")

R <- matrix(c(1, 2, 0, 0, 3,
              4, 1, 0, 0, 2,
              2, 1, 0, 0, 3,
              1, 2, 0, 4, 3,
              2, 1, 0, 3, 4,
              4, 1, 0, 0, 2,
              2, 1, 0, 0, 3,
              1, 2, 0, 1, 3,
              2, 0, 0, 0, 1,
              0, 0, 0, 1, 2), nrow = 10, byrow = TRUE)

colnames(R) <- c("apple", "banana", "orange", "pear", "grape")

R <- as.rankings(R)

# take rows 9 and 10 supposing that it belongs to a different fold in a
# cross-validation
R <- R[-c(9:10), ]

G <- group(R, index = 1:length(R))
p <- data.frame(p = rep(1, length(G)))
dt <- cbind(G, p)

pl <- pltree(G ~ p, data = dt)

# it does not work as shown in issue #25 
predict(pl, newdata = dt)
AIC(pl, newdata = dt)

# but works with vcov = FALSE for predict()
predict(pl, newdata = dt, vcov = FALSE)

# and still dont work for AIC 
AIC(pl, newdata = dt, vcov = FALSE)

# this because orange got off of the network when we sampled the folds
a <- adjacency(R)

plot(network(a))

# the issue still persists even if we increase npseudo 
pl2 <- pltree(G ~ p, data = dt, npseudo = 0.8)

The question is, do you think that this problem can be solved with npseudo (eventually) or should we deal with it by passing vcov = FALSE to the predict() method?

Thanks in advance

The text was updated successfully, but these errors were encountered:

hturner · 2021-05-05T15:12:56Z

Thanks for digging down to find the cause of this issue.

The addition of pseudo rankings allows the worth to be estimated, but these pseudo rankings are removed before estimating the variance-covariance matrix. If an item is then completely missing from the rankings this leads to zero rows and columns in the Information matrix which makes it non-invertible, so the variance can't be estimated. I am not sure what the appropriate fix should be here but will follow this up (it may be a few months before I get to it as prioritising work on PLADMM in May/June).

AIC.pltree() doesn't need to compute the variance-covariance matrix, that was throwing an error due to a call to itempar() which defaults to vcov = TRUE. I have replaced this call and made a PR to the master branch; once that's merged in AIC(pl, newdata = dt) should work if you install the package from GitHub. However as newdata is actually the original data used in the fit here, it would be better to simply call AIC(pl) which avoids even more unnecessary computation and should work with the current PlackettLuce release (0.4.0). (This also goes for the call to predict - better not to specify newdata unless you are specifying data that is different from the data used in the fit!)

A partial fix to #50, avoiding the computation of the variance-covariance matrix in AIC when not needed (also avoided unnecessary computation of vcov in predict.PLADMM).

hturner mentioned this issue May 5, 2021

24 weak networks #51

Merged

hturner added a commit that referenced this issue May 5, 2021

Merge pull request #51 from hturner/24-weak-networks

da6bca1

A partial fix to #50, avoiding the computation of the variance-covariance matrix in AIC when not needed (also avoided unnecessary computation of vcov in predict.PLADMM).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with weak networks #50

Dealing with weak networks #50

kauedesousa commented Apr 7, 2021 •

edited

Loading

hturner commented May 5, 2021

Dealing with weak networks #50

Dealing with weak networks #50

Comments

kauedesousa commented Apr 7, 2021 • edited Loading

hturner commented May 5, 2021

kauedesousa commented Apr 7, 2021 •

edited

Loading