Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cr_works has missing entries in result when deep_paging vs regular search #212

Open
aroranipun opened this issue Nov 29, 2020 · 3 comments

Comments

@aroranipun
Copy link

aroranipun commented Nov 29, 2020

f1<-cr_works(query = "human agency",cursor_max =100,cursor = "*")
f<-cr_works(query = "human agency",limit = 100)

c1= names(f1$data)
c2= names(f$data)
 
c2[which(! c2 %in% c1)]
[1] "isbn"          "abstract"      "update.policy" "assertion"     "subject"       "subtitle" 

Session information

R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rcrossref_1.1.0

@sckott
Copy link
Contributor

sckott commented Nov 30, 2020

please include session info as requested: output of devtools::session_info() or sessionInfo()

@aroranipun
Copy link
Author

Just added session information.
PS: Also, many thanks for solving the problem in the previous issue.

@sckott
Copy link
Contributor

sckott commented Dec 2, 2020

Thanks for that. of course.

If we look at an example where we can quickly collect all results, both regular and deep paging get the same results, but just in different order

res1 <- cr_works(query = "ecology",
      flq = c(query.author = 'Smith', query.bibliographic = 'avian'), limit = 50)
res2 <- cr_works(query = "ecology",
      flq = c(query.author = 'Smith', query.bibliographic = 'avian'), cursor = "*")
all(sort(res1$data$doi) %in% sort(res2$data$doi))
#> TRUE
c1 = names(res1$data)
c2 = names(res2$data)
c2[which(!c2 %in% c1)]
#> character(0)

So I think the discrepancy you're seeing is deep paging extracting results using different code on their servers than what is used for regular search. It's not ideal, but I think that is what's going on. We're as far as I know not using different parsing code in this package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants