Handling of NAs and get_weights_nonmetric() for non-reflective blocks #2

guilhemchalancon · 2014-03-17T18:25:27Z

Dear Gaston,

I run into a bug when using NAs for non reflective blocks (in 30aaec0 as well as in the 0.4.1 stable version).

For instance, introducing NAs in data(russa) as you shown here https://github.com/gastonstat/plspm, plspm() works fine with A or NewA modes, but fails with B, PLSCORE or PLSCOW.

To verify this, I loaded the toy dataset with 3 NAs being introduced:

data(russa)
russNA = russa
russNA[1,1] = NA
russNA[4,4] = NA
russNA[6,6] = NA

rus_path = rbind(c(0, 0, 0), c(0, 0, 0), c(1, 1, 0))
rownames(rus_path) = c("AGRI", "IND", "POLINS")
colnames(rus_path) = c("AGRI", "IND", "POLINS")
rus_blocks = list(1:3, 4:5, 6:9)
rus_scaling = list(c("NUM", "NUM", "NUM"),
                   c("NUM", "NUM"),
                   c("NUM", "NUM", "NUM", "NUM"))

Then running plspm() with non reflective modes:

plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("PLSCOW",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)
# OR
plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("PLSCORE",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)
# OR
plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("B",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)

...which in all cases leads to an error due to non-conformable elements in get_weights_nonmetric.

After digging out what the issue was, I found out that the reason is in the format of get_PLSR_NA() outputs.

Specifically, if we look at the what happens for non-reflective modes when NAs are detected, we can see that w[[q]] is obtained by get_PLSR_NA:

  if (specs$modes[q] == "PLSCORE") {
        if (missing_data[q]) {
          w[[q]] = get_PLSR_NA(Y = Z[,q], X = QQ[[q]], ncomp = PLScomp[q])$B )
          # compute Y[i,q] as the regr. coeff. of QQ[[q]][i,] on w[[q]] 
          # considering only the columns where QQ[[q]][i,l] exist
          Y[,q] = colSums(t(QQ[[q]])*w[[q]], na.rm=TRUE)
          Y[,q] = Y[,q]/colSums((t(X_avail[[q]])*w[[q]])^2)
          # normalize Y[,q] to unitary variance
          Y[,q] = scale(Y[,q]) * correction     
        }
        else {# complete data in block q
          w[[q]] = get_PLSR(Y = Z[,q], X = QQ[[q]], ncomp = PLScomp[q])$B
          Y[,q] = QQ[[q]] %*% w[[q]]
          Y[,q] = scale(Y[,q]) * correction
        }   
      }

It turns out that get_PLSR_NA() renders a 1-column matrix, although a numeric vector is needed for the product QQ[[q]])*w[[q]] to work. As a result the function fails to assign values to Y[,q].

Note that the only cases in get_weights_nonmetric() where w[[q]] is not a numeric vector is precisely when missing_data(q) == T and specs$modes[q] is either PLSCORE, PLSCOW or B. In other words, whenever get_PLSR_NA() is called.

I found that converting w[[q]] to the right format (i.e. w[[q]] = t( get_PLSR_NA(Y=...) )[1,] was sufficient in my context (because I didn't check whether get_PLSR_NA was used in other contexts where a 1-column matrix might be expected).

However, it might be better to change the output format of get_PLSR_NA() instead.

Best regards,
G

The text was updated successfully, but these errors were encountered:

gastonstat · 2014-03-18T22:44:30Z

Hi Guillaume

Thanks a lot for your emails and bug reports,

I'll forward the information to Giorgio Russolillo, and hopefully we'll be
adding the necessary modifications to the plspm package soon

All the best,

Gaston

On Mon, Mar 17, 2014 at 11:25 AM, guilhemchalancon <notifications@github.com

wrote:

Dear Gaston,

I run into a bug when using NAs for non reflective blocks (in 30aaec030aaec0a44b5c8be7bc24ef48dcf8283ce6b80f2as well as in the 0.4.1 stable version).

For instance, introducing NAs in data(russa) as you shown here
https://github.com/gastonstat/plspm, plspm() works fine with A or NewA
modes, but fails with B, PLSCORE or PLSCOW.

To verify this, I loaded the toy dataset with 3 NAs being introduced:

data(russa)
russNA = russa
russNA[1,1] = NA
russNA[4,4] = NA
russNA[6,6] = NA

rus_path = rbind(c(0, 0, 0), c(0, 0, 0), c(1, 1, 0))
rownames(rus_path) = c("AGRI", "IND", "POLINS")
colnames(rus_path) = c("AGRI", "IND", "POLINS")
rus_blocks = list(1:3, 4:5, 6:9)
rus_scaling = list(c("NUM", "NUM", "NUM"),
c("NUM", "NUM"),
c("NUM", "NUM", "NUM", "NUM"))

Then running plspm() with non reflective modes:

plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("PLSCOW",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)

OR

plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("PLSCORE",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)

OR

plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling, modes = rep("B",3), scheme = "centroid", plscomp = c(1,1,1), tol = 0.0000001)

...which in all cases leads to an error due to _non-conformable elements_in
get_weights_nonmetric.

After digging out what the issue was, I found out that the reason is in
the format of get_PLSR_NA() outputs.

Specifically, if we look at the what happens for non-reflective modes when
NAs are detected, we can see that w[[q]] is obtained by get_PLSR_NA:

if (specs$modes[q] == "PLSCORE") {
if (missing_data[q]) {
w[[q]] = get_PLSR_NA(Y = Z[,q], X = QQ[[q]], ncomp = PLScomp[q])$B )
# compute Y[i,q] as the regr. coeff. of QQ[[q]][i,] on w[[q]]
# considering only the columns where QQ[[q]][i,l] exist
Y[,q] = colSums(t(QQ[[q]])_w[[q]], na.rm=TRUE)
Y[,q] = Y[,q]/colSums((t(X_avail[[q]])_w[[q]])^2)
# normalize Y[,q] to unitary variance
Y[,q] = scale(Y[,q]) * correction
}
else {# complete data in block q
w[[q]] = get_PLSR(Y = Z[,q], X = QQ[[q]], ncomp = PLScomp[q])$B
Y[,q] = QQ[[q]] %*% w[[q]]
Y[,q] = scale(Y[,q]) * correction
}
}

It turns out that get_PLSR_NA() renders a 1-column matrix, although a
numeric vector is needed for the product t(QQ[[q]])*w[[q]] to work. As a
result the function fails to assign values to Y[,q].

Note that the only cases in get_weights_nonmetric() where w[[q]] is _not_a numeric vector is precisely when missing_data(q)
== T and specs$modes[q] is either PLSCORE, PLSCOW or B. In other words,
whenever get_PLSR_NA() is called.

I found that converting w[[q]] to the right format (i.e. w[[q]] = t(
get_PLSR_NA(Y=...) )[1,] was sufficient in my context (because I didn't
check whether get_PLSR_NA was used in other contexts where a 1-column
matrix might be expected).

However, it might be better to change the output format of get_PLSR_NA()instead.

Best regards,
G

Reply to this email directly or view it on GitHubhttps://github.com//issues/2
.

_G_aston _S_anchez, PhD
gastonsanchez.com http://www.gastonsanchez.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of NAs and get_weights_nonmetric() for non-reflective blocks #2

Handling of NAs and get_weights_nonmetric() for non-reflective blocks #2

guilhemchalancon commented Mar 17, 2014

gastonstat commented Mar 18, 2014

OR

OR

Handling of NAs and get_weights_nonmetric() for non-reflective blocks #2

Handling of NAs and get_weights_nonmetric() for non-reflective blocks #2

Comments

guilhemchalancon commented Mar 17, 2014

gastonstat commented Mar 18, 2014

OR

OR