Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsers/spack: read categories and licenses #1422

Merged
merged 1 commit into from
Aug 15, 2024
Merged

Conversation

wdconinc
Copy link
Contributor

Spack now publishes categories and licenses in the repology.json.

Disclaimers:

  • Not all packages have a license. License can be UNKNOWN. License is recommended to be SPDX but not enforced. Not all licenses are checked (yet) by humans (and unchecked/checked licenses are both exported to repology.json). Since different versions are a key use case of spack, packages can include multiple licenses that apply to different version ranges, but the list is flattened in repology.json.
  • Not all packages have a category. Packages can have multiple categories.

E.g. current output:

$ curl -L https://raw.githubusercontent.com/spack/packages.spack.io/main/data/repology.json | jq -S '.packages.[] | (.licenses, .categories)' | head -n50
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17.5M  100 17.5M    0     0  10.2M      0  0:00:01  0:00:01 --:--:-- 10.2M
[
  "BSD-2-Clause"
]
[]
[
  "BSD-2-Clause"
]
[]
[]
[]
[]
[]
[
  "MIT"
]
[]
[
  "BSD-2-Clause"
]
[]
[
  "MIT"
]
[
  "rocm",
  "detectable"
]
[
  "LGPL-2.1-or-later"
]
[]
[
  "BSD-3-Clause"
]
[]
[
  "MIT"
]
[
  "ecp"
]
[]
[
  "rocm"
]
[
  "MIT"
]

@AMDmi3
Copy link
Member

AMDmi3 commented Aug 14, 2024

LGTM, but spack parsing is currently broken as it misses required downloads field for tramonto package. Judging by the recipe it fetches from git, so it would be nice for git url to be published instead of downloads in this case.

@wdconinc
Copy link
Contributor Author

Is that the only parsing failure, or just the first parsing failure of a potentially long list?

@AMDmi3
Copy link
Member

AMDmi3 commented Aug 14, 2024

Looks like it's the only for now (it was broken recently). It should be fixed on Repologys side anyway as downloads is not in fact mandatory, but I'd prefer not to lose links to upstream.

@wdconinc
Copy link
Contributor Author

I've modified our scripts to inject the git repo as the version downloads link and in the package-wide downloads list as well. We only regenerate repology.json once a day, but it should be fixed tomorrow.

@AMDmi3
Copy link
Member

AMDmi3 commented Aug 14, 2024

Can it be a different attribute please? Repology differentiates download and repository URLs.

@wdconinc
Copy link
Contributor Author

So, using tramonto as an example, which fields would you ideally like to contain what (assuming conversion with the current spack json parser)?

We now have this:

{
  "alias": [],
  "categories": [],
  "dependencies": [
    "cmake",
    "gmake",
    "ninja",
    "trilinos"
  ],
  "downloads": [
    "https://github.com/Tramonto/Tramonto.git"
  ],
  "homepages": [
    "https://software.sandia.gov/tramonto/"
  ],
  "licenses": [],
  "maintainers": [],
  "name": "tramonto",
  "patches": [],
  "summary": "Tramonto: Software for Nanostructured Fluids in Materials and Biology\n",
  "version": [
    {
      "branch": "develop",
      "downloads": [
        "https://github.com/Tramonto/Tramonto.git"
      ],
      "version": "develop"
    }
  ]
}

Should we not list the repo at the package level? Or should we not list it at the version level?

@wdconinc
Copy link
Contributor Author

And I was looking at PackageMaker (https://github.com/repology/repology-updater/blob/master/repology/packagemaker/__init__.py#L220) and couldn't immediately see where a package would specify the repository instead of a downloads.

@AMDmi3
Copy link
Member

AMDmi3 commented Aug 14, 2024

The same place where downloads are, just a different key. Something like this I suppose:

{
  ...
  "version": [
    {
      "branch": "develop",
      "repositories": [
        "https://github.com/Tramonto/Tramonto.git"
      ],
      "version": "develop"
    }
  ]
}

or more correct

{
  ...
  "version": [
    {
      "repositories": [
        {
          "url": "https://github.com/Tramonto/Tramonto.git",
          "branch": "master"
        }
      ],
      "version": "develop"
    }
  ]
}

@wdconinc
Copy link
Contributor Author

We went with:

{
  ...
  "version": [
    {
      "repositories": [
        {
          "type": "git",
          "url": "https://github.com/Tramonto/Tramonto.git",
          "branch": "master"
        }
      ],
      "version": "develop"
    }
  ]
}

since technically we also support svn repos (used by two packages or so...).

This is live now at https://raw.githubusercontent.com/spack/packages.spack.io/main/data/repology.json, with "branch": "master" still duplicated at the "version": level to allow the current parser to pick that up.

@AMDmi3
Copy link
Member

AMDmi3 commented Aug 14, 2024

Can there be multiple repositories? If yes, I expect repositories to be an array, otherwise shouldn't it be "repository"?

                {
                    "branch": "master",
                    "repositories": {
                        "branch": "master",
                        "type": "git",
                        "url": "git://gcc.gnu.org/git/gcc.git"
                    },
                    "version": "master"
                }

@AMDmi3 AMDmi3 merged commit ea64c13 into repology:master Aug 15, 2024
1 check passed
@AMDmi3
Copy link
Member

AMDmi3 commented Aug 15, 2024

I've committed the code which should work with both array and plain repositories to fix the parsing for the time being (as I'm going offline for a couple of weeks).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants