Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with weak networks #50

Open
kauedesousa opened this issue Apr 7, 2021 · 1 comment
Open

Dealing with weak networks #50

kauedesousa opened this issue Apr 7, 2021 · 1 comment

Comments

@kauedesousa
Copy link
Contributor

kauedesousa commented Apr 7, 2021

Dear Heather,

Here comes an issue that may be related to issue #25. But now I think we have a better clue on where is the problem, which arrises mostly when we are performing cross-validations and pltree() is exposed to a set of data with a weak network.

Here is an example

library("PlackettLuce")
source("https://raw.githubusercontent.com/AgrDataSci/ClimMob-analysis/master/R/functions.R")

R <- matrix(c(1, 2, 0, 0, 3,
              4, 1, 0, 0, 2,
              2, 1, 0, 0, 3,
              1, 2, 0, 4, 3,
              2, 1, 0, 3, 4,
              4, 1, 0, 0, 2,
              2, 1, 0, 0, 3,
              1, 2, 0, 1, 3,
              2, 0, 0, 0, 1,
              0, 0, 0, 1, 2), nrow = 10, byrow = TRUE)

colnames(R) <- c("apple", "banana", "orange", "pear", "grape")

R <- as.rankings(R)

# take rows 9 and 10 supposing that it belongs to a different fold in a
# cross-validation
R <- R[-c(9:10), ]

G <- group(R, index = 1:length(R))
p <- data.frame(p = rep(1, length(G)))
dt <- cbind(G, p)

pl <- pltree(G ~ p, data = dt)

# it does not work as shown in issue #25 
predict(pl, newdata = dt)
AIC(pl, newdata = dt)

# but works with vcov = FALSE for predict()
predict(pl, newdata = dt, vcov = FALSE)

# and still dont work for AIC 
AIC(pl, newdata = dt, vcov = FALSE)

# this because orange got off of the network when we sampled the folds
a <- adjacency(R)

plot(network(a))

# the issue still persists even if we increase npseudo 
pl2 <- pltree(G ~ p, data = dt, npseudo = 0.8)


The question is, do you think that this problem can be solved with npseudo (eventually) or should we deal with it by passing vcov = FALSE to the predict() method?

Thanks in advance

@hturner
Copy link
Owner

hturner commented May 5, 2021

Thanks for digging down to find the cause of this issue.

The addition of pseudo rankings allows the worth to be estimated, but these pseudo rankings are removed before estimating the variance-covariance matrix. If an item is then completely missing from the rankings this leads to zero rows and columns in the Information matrix which makes it non-invertible, so the variance can't be estimated. I am not sure what the appropriate fix should be here but will follow this up (it may be a few months before I get to it as prioritising work on PLADMM in May/June).

AIC.pltree() doesn't need to compute the variance-covariance matrix, that was throwing an error due to a call to itempar() which defaults to vcov = TRUE. I have replaced this call and made a PR to the master branch; once that's merged in AIC(pl, newdata = dt) should work if you install the package from GitHub. However as newdata is actually the original data used in the fit here, it would be better to simply call AIC(pl) which avoids even more unnecessary computation and should work with the current PlackettLuce release (0.4.0). (This also goes for the call to predict - better not to specify newdata unless you are specifying data that is different from the data used in the fit!)

@hturner hturner mentioned this issue May 5, 2021
hturner added a commit that referenced this issue May 5, 2021
A partial fix to #50, avoiding the computation of the variance-covariance matrix in AIC when not needed (also avoided unnecessary computation of vcov in predict.PLADMM).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants