Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_dummy() handling of vectors containing zeros #3

Open
guilhemchalancon opened this issue Mar 20, 2014 · 1 comment
Open

get_dummy() handling of vectors containing zeros #3

guilhemchalancon opened this issue Mar 20, 2014 · 1 comment

Comments

@guilhemchalancon
Copy link

Hi,

I've spotted an issue with get_dummy(), when running plspm() with a model comprising ordinal variables with "0" as one of the categories.

Let's assume we have a variable in a block q such that Xblocks[[q]][,p] is made of numbers between 0 and 4 (i.e. 5 categories).

Basically, get_weights_nonmetric() will fail at handling such variable, stopping at that line: get_ord_scale(Z[,q], Xblocks[[q]][,p], aux_dummy)

The reason is that in that case, aux_dummy lacks 1 column, precisely the column corresponding to the category "0".

So I looked at how dummies are obtained, and found out that get_dummy() is at the root of the problem.

get_dummy <- function (x) 
{
    n = length(x)
    p = max(x, na.rm = TRUE)
    Xdummy = matrix(0, n, p)
    for (k in 1:p) {
        Xdummy[x == k, k] = 1
    }
    if (any(is.na(x))) {
        Xdummy[which(rowSums(Xdummy) == 0), ] <- NA
    }
    Xdummy
}

As you can see, the number of columns p is based on the maximum value of x. But this won't work when 0s are present in x (p would need to be max(x)+1).

Very simply, defining p as p = length(unique(x)) ensures that the function works every time.

Best,
G

@guilhemchalancon guilhemchalancon changed the title get.dummy() handling of vectors containing zeros get_dummy() handling of vectors containing zeros Mar 20, 2014
@guilhemchalancon
Copy link
Author

The same reasoning applies for get_ord_scale where also p is defined as max(x, na.rm=T)

Also, I wonder whether having 0s in categories like I did actually makes sense when computing the ordinal scale, so perhaps a conditional test should be introduced to prevent that users define ordinal variables like that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant