Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any way to get full text from rcrossref anymore #240

Open
padpadpadpad opened this issue Sep 28, 2023 · 2 comments
Open

Any way to get full text from rcrossref anymore #240

padpadpadpad opened this issue Sep 28, 2023 · 2 comments

Comments

@padpadpadpad
Copy link

Hi everyone

Just wondering if there is anyway to get full text from rcrossref anymore. I am interested in trying to collate Data Accessibility statements and am conscious publishers probably wont like it if I start read_html()-ing loads of webpages.

Seems like the methods moved to crminer, but that is no longer under development so just interested if anyone has any recommendations.

Cheers
Dan

@njahn82
Copy link
Member

njahn82 commented Sep 29, 2023

Hi,

You can use rcrossref to identify articles and get TDM full-text links from Crossref. Once you have the TDM links, you'll need to check with the publisher to see how to download the full texts.

1. Identify articles and get TDM full-text links from Crossref

The following reprex shows how to identify articles and get TDM full-text links from Crossref for the DOI 10.1002/asi.24460:

library(rcrossref)
library(tidyverse)
my_cr_df <- cr_works(doi = "10.1002/asi.24460")$data

tdm_links <- my_cr_df |>
  select(doi, link) |>
  unnest(link) |>
  filter(intended.application == "text-mining")
 
tdm_links |>
   select(URL, content.type)
#> # A tibble: 2 × 2
#>   URL                                                            content.type   
#>   <chr>                                                          <chr>          
#> 1 https://onlinelibrary.wiley.com/doi/pdf/10.1002/asi.24460      application/pdf
#> 2 https://onlinelibrary.wiley.com/doi/full-xml/10.1002/asi.24460 application/xml

Created on 2023-09-29 with reprex v2.0.2

2. Download

Once you have the TDM links, you'll need to check with the publisher to see how to download the full texts. Many publishers, such as Elsevier and Wiley, require you to register for an API key and add it to your HTTP request. Even if a full-text is open access, these publishers may not allow you to access it programmatically without registration.

Here are links to information on how to download full texts from Elsevier and Wiley:

General Crossref TDM info: https://www.crossref.org/documentation/retrieve-metadata/rest-api/text-and-data-mining/

I hope this is helpful!

Najko

@padpadpadpad
Copy link
Author

This is super useful thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants