Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when id colum is not sequential and contains id values larger than corpus length. #2

Open
grlju opened this issue Jan 19, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@grlju
Copy link

grlju commented Jan 19, 2024

Hi Dirk,

Thanks a lot for your great package! It is a great help with a project I am currently working on.

I noticed that I get an error when the doc_id column is not sequential. The example below should be able to reproduce it.

require(text2sdg)
require(corpustools)

d <- data.frame(text = c('Text one first sentence.',
                        'Climate change is bad. Do something about extreme poverty', 
                        'Do something about extreme poverty', 'One '),
                doc_id = c(1, 3, 7, 10),
                date = c('2010-01-01','2010-01-01','2012-01-01', '2012-01-01'),
                source = c('A','B','B', 'C'))
tc <- create_tcorpus(d)
sdgs <- detect_sdg(tc)
Running systems
Obtaining text lengths
Building features
Running ensemble
Error: Missing data in columns: n_words.

In addition, if you only fix the ensemble.R, you will see that text 3 (ID 7) is not being identified in the result. I looked at the code and identified that this is because of how the ID columns are created internally in the ensemble.R and systems.R files.

The fix is really minor, so I have implemented it and will create a pull request.

@psychobas psychobas self-assigned this Jan 22, 2024
@psychobas psychobas added the bug Something isn't working label Jan 22, 2024
@psychobas
Copy link
Collaborator

Hi @grlju,

Thanks a lot for bringing this to our attention and proposing a solution! I will merge your pull request into a new branch, test it, and then merge it into main.

Happy to hear that you find the package useful!

Best,
Dominik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants