Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of NAs and get_weights_nonmetric() for non-reflective blocks #2

Open
guilhemchalancon opened this issue Mar 17, 2014 · 1 comment

Comments

@guilhemchalancon
Copy link

Dear Gaston,

I run into a bug when using NAs for non reflective blocks (in 30aaec0 as well as in the 0.4.1 stable version).

For instance, introducing NAs in data(russa) as you shown here https://github.com/gastonstat/plspm, plspm() works fine with A or NewA modes, but fails with B, PLSCORE or PLSCOW.

To verify this, I loaded the toy dataset with 3 NAs being introduced:

data(russa)
russNA = russa
russNA[1,1] = NA
russNA[4,4] = NA
russNA[6,6] = NA

rus_path = rbind(c(0, 0, 0), c(0, 0, 0), c(1, 1, 0))
rownames(rus_path) = c("AGRI", "IND", "POLINS")
colnames(rus_path) = c("AGRI", "IND", "POLINS")
rus_blocks = list(1:3, 4:5, 6:9)
rus_scaling = list(c("NUM", "NUM", "NUM"),
                   c("NUM", "NUM"),
                   c("NUM", "NUM", "NUM", "NUM"))

Then running plspm() with non reflective modes:

plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("PLSCOW",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)
# OR
plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("PLSCORE",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)
# OR
plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("B",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)

...which in all cases leads to an error due to non-conformable elements in get_weights_nonmetric.

After digging out what the issue was, I found out that the reason is in the format of get_PLSR_NA() outputs.

Specifically, if we look at the what happens for non-reflective modes when NAs are detected, we can see that w[[q]] is obtained by get_PLSR_NA:

  if (specs$modes[q] == "PLSCORE") {
        if (missing_data[q]) {
          w[[q]] = get_PLSR_NA(Y = Z[,q], X = QQ[[q]], ncomp = PLScomp[q])$B )
          # compute Y[i,q] as the regr. coeff. of QQ[[q]][i,] on w[[q]] 
          # considering only the columns where QQ[[q]][i,l] exist
          Y[,q] = colSums(t(QQ[[q]])*w[[q]], na.rm=TRUE)
          Y[,q] = Y[,q]/colSums((t(X_avail[[q]])*w[[q]])^2)
          # normalize Y[,q] to unitary variance
          Y[,q] = scale(Y[,q]) * correction     
        }
        else {# complete data in block q
          w[[q]] = get_PLSR(Y = Z[,q], X = QQ[[q]], ncomp = PLScomp[q])$B
          Y[,q] = QQ[[q]] %*% w[[q]]
          Y[,q] = scale(Y[,q]) * correction
        }   
      }

It turns out that get_PLSR_NA() renders a 1-column matrix, although a numeric vector is needed for the product QQ[[q]])*w[[q]] to work. As a result the function fails to assign values to Y[,q].

Note that the only cases in get_weights_nonmetric() where w[[q]] is not a numeric vector is precisely when missing_data(q) == T and specs$modes[q] is either PLSCORE, PLSCOW or B. In other words, whenever get_PLSR_NA() is called.

I found that converting w[[q]] to the right format (i.e. w[[q]] = t( get_PLSR_NA(Y=...) )[1,] was sufficient in my context (because I didn't check whether get_PLSR_NA was used in other contexts where a 1-column matrix might be expected).

However, it might be better to change the output format of get_PLSR_NA() instead.

Best regards,
G

@gastonstat
Copy link
Owner

Hi Guillaume

Thanks a lot for your emails and bug reports,

I'll forward the information to Giorgio Russolillo, and hopefully we'll be
adding the necessary modifications to the plspm package soon

All the best,

Gaston

On Mon, Mar 17, 2014 at 11:25 AM, guilhemchalancon <notifications@github.com

wrote:

Dear Gaston,

I run into a bug when using NAs for non reflective blocks (in 30aaec030aaec0a44b5c8be7bc24ef48dcf8283ce6b80f2as well as in the 0.4.1 stable version).

For instance, introducing NAs in data(russa) as you shown here
https://github.com/gastonstat/plspm, plspm() works fine with A or NewA
modes, but fails with B, PLSCORE or PLSCOW.

To verify this, I loaded the toy dataset with 3 NAs being introduced:

data(russa)
russNA = russa
russNA[1,1] = NA
russNA[4,4] = NA
russNA[6,6] = NA

rus_path = rbind(c(0, 0, 0), c(0, 0, 0), c(1, 1, 0))
rownames(rus_path) = c("AGRI", "IND", "POLINS")
colnames(rus_path) = c("AGRI", "IND", "POLINS")
rus_blocks = list(1:3, 4:5, 6:9)
rus_scaling = list(c("NUM", "NUM", "NUM"),
c("NUM", "NUM"),
c("NUM", "NUM", "NUM", "NUM"))

Then running plspm() with non reflective modes:

plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("PLSCOW",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)

OR

plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("PLSCORE",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)

OR

plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("B",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)

...which in all cases leads to an error due to _non-conformable elements_in
get_weights_nonmetric.

After digging out what the issue was, I found out that the reason is in
the format of get_PLSR_NA() outputs.

Specifically, if we look at the what happens for non-reflective modes when
NAs are detected, we can see that w[[q]] is obtained by get_PLSR_NA:

if (specs$modes[q] == "PLSCORE") {
if (missing_data[q]) {
w[[q]] = get_PLSR_NA(Y = Z[,q], X = QQ[[q]], ncomp = PLScomp[q])$B )
# compute Y[i,q] as the regr. coeff. of QQ[[q]][i,] on w[[q]]
# considering only the columns where QQ[[q]][i,l] exist
Y[,q] = colSums(t(QQ[[q]])_w[[q]], na.rm=TRUE)
Y[,q] = Y[,q]/colSums((t(X_avail[[q]])_w[[q]])^2)
# normalize Y[,q] to unitary variance
Y[,q] = scale(Y[,q]) * correction
}
else {# complete data in block q
w[[q]] = get_PLSR(Y = Z[,q], X = QQ[[q]], ncomp = PLScomp[q])$B
Y[,q] = QQ[[q]] %*% w[[q]]
Y[,q] = scale(Y[,q]) * correction
}
}

It turns out that get_PLSR_NA() renders a 1-column matrix, although a
numeric vector is needed for the product t(QQ[[q]])*w[[q]] to work. As a
result the function fails to assign values to Y[,q].

Note that the only cases in get_weights_nonmetric() where w[[q]] is _not_a numeric vector is precisely when missing_data(q)
== T and specs$modes[q] is either PLSCORE, PLSCOW or B. In other words,
whenever get_PLSR_NA() is called.

I found that converting w[[q]] to the right format (i.e. w[[q]] = t(
get_PLSR_NA(Y=...) )[1,] was sufficient in my context (because I didn't
check whether get_PLSR_NA was used in other contexts where a 1-column
matrix might be expected).

However, it might be better to change the output format of get_PLSR_NA()instead.

Best regards,
G

Reply to this email directly or view it on GitHubhttps://github.com//issues/2
.

_G_aston _S_anchez, PhD
gastonsanchez.com http://www.gastonsanchez.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants