-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do we need the rawdata? #2199
Comments
If it is necessary (because we want to keep a copy), I will create a separate repository to keep the rawdata there since the rawdata does not need version control. |
I have expressed this concern before and the solution I proposed at that time was to have a What also makes it worse currently is that for some datasets the whole hepdata tables are downloaded while only a few tables are needed. |
In this way we don't store the rawdata in the repo but only locally for those implementing the data and downloaded during the CI check of the commondata. FWIW, we can revive and refine this module. |
I indeed remember @Radonirinaunimi 's concern and proposal. I also remember that I did not support it too much, hoping (naively) that ONLY the few relevant tables would have been downloaded and stored. Even if you decide not to store the tables, I consider that ONLY the relevant ones should be downloaded locally. |
But this also is a solution ofc (and maybe using git submodules). |
Yes, what prompted me to open the issue is that revising some of the dataset I noticed many more tables than necessary were included. As a first step (in the automatic download approach) we can just download the whole hepdata data from a given version so we avoid having to deal with specific tables (as I don't think the information is correctly included in all metadata) That will be only in the computer of the person implementing the dataset and in the CI. |
For what it matters, I'm also in favour of downloading them once needed, otherwise storing them as |
I'm starting to get worried about the growth of the repository due to the rawdata...
when we cannot rely directly on hepdata it makes sense to save it but can't we just download the rawdata as part of the filter.py run? I remember that some of you were working on that, would it be possible?
cc @enocera @Radonirinaunimi @giacomomagni
The text was updated successfully, but these errors were encountered: