docs object expects all word frequencies to be 1 - transformation from dfm object (quanteda) #10

JonasRieger · 2021-06-29T08:29:14Z

The docs object expects (for technical reasons) that all words occur with frequency 1. If words occur several times, they appear several times each with frequency 1.
In the quanteda package there are dfm objects that also allow values greater than 1. If you do your preprocessing in quanteda and want to use quanteda::dfm2lda to convert your object into the necessary structure, you need one more step to fulfill the requirements for the docs object. Just execute the following line:

docs = lapply(docs, function(x) rbind(rep(x[1,], x[2,]), 1))

This replicates words with multiple occurrences and protects you from the error message all(sapply(docs, function(x) all(x[2, ] == 1))) is not TRUE in LDARep and similar functions.

The text was updated successfully, but these errors were encountered:

abitter · 2021-11-05T17:07:52Z

Unfortunately, this yields a numeric matrix (at least in R 4.1.1), whereas LDARep expects an integer matrix.
There might be a more elegant solution, but this did the trick for me:

docs <- lapply(docs, function(x) rbind(rep(as.integer(x[1,]), as.integer(x[2,])), as.integer(1)))

JonasRieger · 2021-11-07T19:26:54Z

Yeah, you're right.

docs = convert(dfmat, "lda")$documents
docs = lapply(docs, function(x) rbind(rep(x[1,], x[2,]), 1L))

should do it as well.

JonasRieger added the usability Enhancement of user friendliness label Jun 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs object expects all word frequencies to be 1 - transformation from dfm object (quanteda) #10

docs object expects all word frequencies to be 1 - transformation from dfm object (quanteda) #10

JonasRieger commented Jun 29, 2021

abitter commented Nov 5, 2021

JonasRieger commented Nov 7, 2021

docs object expects all word frequencies to be 1 - transformation from dfm object (quanteda) #10

docs object expects all word frequencies to be 1 - transformation from dfm object (quanteda) #10

Comments

JonasRieger commented Jun 29, 2021

abitter commented Nov 5, 2021

JonasRieger commented Nov 7, 2021