cr_works has missing entries in result when deep_paging vs regular search #212

aroranipun · 2020-11-29T07:58:27Z

f1<-cr_works(query = "human agency",cursor_max =100,cursor = "*")
f<-cr_works(query = "human agency",limit = 100)

c1= names(f1$data)
c2= names(f$data)
 
c2[which(! c2 %in% c1)]
[1] "isbn"          "abstract"      "update.policy" "assertion"     "subject"       "subtitle"

Session information

R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rcrossref_1.1.0

sckott · 2020-11-30T19:46:36Z

please include session info as requested: output of devtools::session_info() or sessionInfo()

aroranipun · 2020-12-01T19:03:39Z

Just added session information.
PS: Also, many thanks for solving the problem in the previous issue.

sckott · 2020-12-02T01:29:39Z

Thanks for that. of course.

If we look at an example where we can quickly collect all results, both regular and deep paging get the same results, but just in different order

res1 <- cr_works(query = "ecology",
      flq = c(query.author = 'Smith', query.bibliographic = 'avian'), limit = 50)
res2 <- cr_works(query = "ecology",
      flq = c(query.author = 'Smith', query.bibliographic = 'avian'), cursor = "*")
all(sort(res1$data$doi) %in% sort(res2$data$doi))
#> TRUE
c1 = names(res1$data)
c2 = names(res2$data)
c2[which(!c2 %in% c1)]
#> character(0)

So I think the discrepancy you're seeing is deep paging extracting results using different code on their servers than what is used for regular search. It's not ideal, but I think that is what's going on. We're as far as I know not using different parsing code in this package.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cr_works has missing entries in result when deep_paging vs regular search #212

cr_works has missing entries in result when deep_paging vs regular search #212

aroranipun commented Nov 29, 2020 •

edited

Loading

sckott commented Nov 30, 2020

aroranipun commented Dec 1, 2020

sckott commented Dec 2, 2020

cr_works has missing entries in result when deep_paging vs regular search #212

cr_works has missing entries in result when deep_paging vs regular search #212

Comments

aroranipun commented Nov 29, 2020 • edited Loading

sckott commented Nov 30, 2020

aroranipun commented Dec 1, 2020

sckott commented Dec 2, 2020

aroranipun commented Nov 29, 2020 •

edited

Loading