Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluation list is slow #415

Closed
giuseppec opened this issue May 9, 2017 · 15 comments
Closed

evaluation list is slow #415

giuseppec opened this issue May 9, 2017 · 15 comments

Comments

@giuseppec
Copy link
Member

@joaquinvanschoren knows the solution and has to implement it

@giuseppec
Copy link
Member Author

This is still very slow.

@giuseppec
Copy link
Member Author

This already takes more than 5 minutes for me: https://www.openml.org/api/v1/json/evaluation/list/tag/study_14/limit/1. Is it just me or how can you guys work with openml if this is not working?

@PhilippPro @DanielKuehn87 what do you guys do? Is this also slow for you guys?

@joaquinvanschoren
Copy link
Contributor

I will look at this tonight.

@PhilippPro
Copy link

PhilippPro commented Jun 1, 2017

On Debian:

> system.time(a <- listOMLRunEvaluations(tag = "study_14"))
Downloading from 'https://www.openml.org/api/v1/json/evaluation/list/tag/study_14' to '<mem>'.
       User      System verstrichen 
      1.708       0.888     767.338 

This is not a big problem for us, as we have everything in a database.

Other very slow functions like getOMLRunParList(getOMLRun(x)) are a much bigger problem, cause it takes many days to get all the hyperparameters via this functions...

see also openml/openml-r#348

@giuseppec
Copy link
Member Author

Hm, the thing is when people start building their own (local) database because openml does not meet their requirement, this is a clear sign that something is wrong with openml, right?

@PhilippPro
Copy link

I don't necessarily think so. If you have a huge database it is natural that you cannot get everything from a database of the internet, this takes too long...

@joaquinvanschoren
Copy link
Contributor

joaquinvanschoren commented Jun 1, 2017 via email

@giuseppec
Copy link
Member Author

giuseppec commented Jun 2, 2017 via email

@joaquinvanschoren
Copy link
Contributor

joaquinvanschoren commented Jun 2, 2017 via email

@DanielKuehn87
Copy link

Ok, what about just creating an image of the OpenML database in SQLight (or some other open database) and give users a way to download the whole thing? I guess, that the OpenML database is something like 10 GB large. So if I want to run a larger analysis, it might be better to just download the whole database instead of running several queries against the API, which transfers the data via XML/JSON.

@joaquinvanschoren
Copy link
Contributor

Under guide > developers there is a link of a nightly snapshot of the database. It has the most useful stuff (not everything, that would be over 100GB).

@DanielKuehn87
Copy link

The link is not working currently, but this is helpful for me. Thanks.

@joaquinvanschoren
Copy link
Contributor

Snapshot link is fixed.

@joaquinvanschoren
Copy link
Contributor

I just submitted a fix for the slow queries. Under review now.
openml/openml.org@1371d0d

@joaquinvanschoren
Copy link
Contributor

Fix is running on production.
Giuseppe's example returns immediately now:
https://www.openml.org/api/v1/json/evaluation/list/tag/study_14/limit/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants