Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

404 error at download attempt #51

Open
andybega opened this issue Jan 6, 2020 · 5 comments
Open

404 error at download attempt #51

andybega opened this issue Jan 6, 2020 · 5 comments

Comments

@andybega
Copy link
Owner

andybega commented Jan 6, 2020

First thank you for developing the icews package. I am trying to use the minimalist functionality and running into an error.
This error occurs for both the update_icews() and download_data() functions when dryrun is set to False. My setup has use_db = F and keep_files =T.

update_icews(dryrun = F)
Downloading 'events.1995.20150313082510.tab.zip'
Error in get_file(file_ref, get_doi()[[repo]]) : Not Found (HTTP 404).

I am hoping this is a common error and an answer is ready available. Thanks for your help.

@andybega
Copy link
Owner Author

andybega commented Jan 6, 2020

This is a simpler example:

This should download one of the documentation PDFs:

library("icews")
#> Options not set, consider running 'setup_icews()'
#> data_dir: NULL
#> use_db: NULL
#> keep_files: NULL
dataverse::get_file(2711073, dataset = get_doi()$historic)
#> Error in dataverse::get_file(2711073, dataset = get_doi()$historic): Not Found (HTTP 404).

Created on 2020-01-06 by the reprex package (v0.3.0)

@andybega
Copy link
Owner Author

andybega commented Jan 6, 2020

Looks like the problem is with either the R dataverse client or the dataverse API itself. The direct URL for the PDF file above is https://dataverse.harvard.edu/api/access/datafile/2711073, and it works.

However, in dataverse::get_file(), a query parameter for the desired format is set to "original" by default, leading to the URL https://dataverse.harvard.edu/api/access/datafile/2711073/?format=original. That breaks and leads to the 404 error.

This is the essential bit from the dataverse::get_file() internals:

library("dataverse")
library("httr")

key <- Sys.getenv("DATAVERSE_KEY")

u <- "https://dataverse.harvard.edu/api/access/datafile/2711073"
query <- list(format = "original")

r <- httr::GET(u, httr::add_headers(`X-Dataverse-key` = key))
# works
status_code(r)
#> [1] 200

# with format argument it does not work
r <- httr::GET(u, httr::add_headers(`X-Dataverse-key` = key), query = query)
status_code(r)
#> [1] 404

Created on 2020-01-06 by the reprex package (v0.3.0)

This looks like a relevant issue in the R dataverse client repo: IQSS/dataverse-client-r#33

@mayeulk
Copy link
Contributor

mayeulk commented Jan 22, 2020

Hi, I have the same issue.
Also note that in the dry run there is no file for the past 6 months or so (last one is 20190625); maybe it's related:

Download            '20190622-icews-events.zip'
Ingest records from '20190622-icews-events.tab'
Download            '20190623-icews-events.zip'
Ingest records from '20190623-icews-events.tab'
Download            '20190624-icews-events.zip'
Ingest records from '20190624-icews-events.tab'
Download            '20190625-icews-events.zip'
Ingest records from '20190625-icews-events.tab'
> # Should list proposed downloads, ingests, etc.
> update_icews(dryrun = FALSE)
Downloading 'events.1995.20150313082510.tab.zip'
Error in dataverse::get_file(file = file_ref, dataset = get_doi()[[repo]]) : 
  Not Found (HTTP 404).

@mayeulk
Copy link
Contributor

mayeulk commented Jan 23, 2020

Here is a workaround and a blueprint for a fix:

.libPaths() # make sure to remove all dataverse in all places
remove.packages("dataverse")
# restart R
devtools::install_github("IQSS/dataverse-client-r")
#Installing package into ‘/home/mk/R/x86_64-pc-linux-gnu-library/3.6’
#(as ‘lib’ is unspecified)
#* installing *source* package ‘dataverse’
library("icews")
library("DBI")
library("dplyr")
library("usethis")
print(sessionInfo())
#loaded via a namespace (and not attached):
#[...]
#[25] glue_1.3.1             dataverse_0.2.1.9001   RSQLite_2.2.0

setup_icews(data_dir = "~/temp_icews", use_db = TRUE, keep_files = TRUE,
            r_profile = TRUE)
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")
update_icews(dryrun = FALSE)

as per IQSS/dataverse-client-r#33 (comment)

Hope it helps! Cheers

@andybega
Copy link
Owner Author

Hey @mayeulk, thanks! This works for me now as well:

devtools::install_github("IQSS/dataverse-client-r")

# restart R
library("icews")
library("dataverse")
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")

# works now
foo = dataverse::get_file(2711073, dataset = get_doi()$historic)

I will keep this issue open until dataverse is updated on CRAN.

Also note that in the dry run there is no file for the past 6 months or so (last one is 20190625); maybe it's related:

ICEWS has stopped updating. I heard they managed to regain funding but I have no idea when they will resume.

andybega added a commit that referenced this issue May 8, 2020
- The manual fix for the get_file issue (#51, #58) breaks R check. Don't do that.
- Instead, I added check to .onLoad that produces a warning if the dataverse version is below the one that has the get_file fix not on CRAN yet. This should prevent a repeat of me re-discovering this issue (closes #58)
- also check to make sure the DATAVERSE_SERVER environment variable is set
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants