Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LicenseRef support #1148

Merged
merged 11 commits into from
Oct 25, 2024
Merged

Add LicenseRef support #1148

merged 11 commits into from
Oct 25, 2024

Conversation

lumaxis
Copy link
Contributor

@lumaxis lumaxis commented Jun 26, 2024

This is a new version of lumaxis#2

Discussion around this at #1096 (comment)

Copy link
Collaborator

@elrayle elrayle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a question about the test results.

@@ -58,7 +58,7 @@ describe('ScanCodeNewSummarizer basic compatability', () => {
const coordinates = { type: 'pypi', provider: 'pypi' }
const harvestData = getHarvestData(scancodeVersion, 'pypi-complex-declared-license')
const result = summarizer.summarize(coordinates, harvestData)
assert.equal(result.licensed.declared, 'HPND')
assert.equal(result.licensed.declared, 'LicenseRef-scancode-secret-labs-2011')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test originally had a license HPND. And now it is just LicenseRef-scancode-secret-labs-2011. Where did the original license come from? I would have thought a change would end up something like HPND AND LicenseRef-scancode-secret-labs-2011.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secret-labs-2011 is the declared license according to the raw ScanCode results. Before this change, our logic fell back to the first package's declared license which is HPND.

I'm not sure which is the ultimately correct one but we need this change to surface the ScanCode result.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to be sure I am understanding logic fell back to the first package's declared license correctly. Looking at the fixture, I see a license under summary which seems like the correct license...

    "summary": {
      "declared_license_expression": "secret-labs-2011",

and farther down, I see the first package (transient dependency) has the HPND as its declared license...

    "packages": [
      {
        "type": "pypi",
        "namespace": null,
        "name": "Pillow",
        "version": "9.5.0",
        ...
        "declared_license_expression": "historical",
        "declared_license_expression_spdx": "HPND",

It would be interesting to understand if that is a correct interpretation of how HPND was identified as the license and why that approach was chosen. To me, that doesn't seem correct as that is the license for Pillow 9.5.0.

@qtomlinson any insights into this?

Copy link
Collaborator

@qtomlinson qtomlinson Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In v30 result (line 760-766) shows content.packages[0].declared_license as HPND

          "license_expression": "historical",
          "declared_license": {
            "license": "HPND",
            "classifiers": [
              "License :: OSI Approved :: Historical Permission Notice and Disclaimer (HPND)"
            ]
          },

Reading from content.packages[0].declared_license was the preferred way before deriving from files in v30 scancode results. So using v30 scancode, the license would be HPND.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pypi also shows "License as [OSI Approved :: Historical Permission Notice and Disclaimer (HPND)]"

Copy link
Collaborator

@qtomlinson qtomlinson Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed Pillow 9.5 was curated as HPND

Copy link
Collaborator

@qtomlinson qtomlinson Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secret-labs-2011 is the declared license according to the raw ScanCode results.

Just noticed another case where declared_license_expression (v32) seems to be different from what is declared from the package. Added here for documentation purposes.
32.3.0.json:

    "summary": {
      "declared_license_expression": "cc-by-4.0 AND cc-by-sa-4.0 AND gpl-2.0",
	  ...
    package[0]
        "declared_license_expression": "gpl-2.0-plus AND gpl-2.0",
        "declared_license_expression_spdx": "GPL-2.0-or-later AND GPL-2.0-only",
	...
    files:
    {
        "path": "pylint-3.2.3/LICENSE",
        "detected_license_expression": "gpl-2.0",
        "detected_license_expression_spdx": "GPL-2.0-only",

30.3.0.json:

    package[0]					
	"license_expression": "gpl-2.0-plus AND gpl-2.0",
	"declared_license": {
		"license": "GPL-2.0-or-later",
		"classifiers": [
			"License :: OSI Approved :: GNU General Public License v2 (GPLv2)"
		]
	},
	....
    files: 
    {
         "path": "pylint-3.2.3/LICENSE",
		"key": "gpl-2.0",

"cc-by-4.0 AND cc-by-sa-4.0 AND gpl-2.0" in v32 is different from "gpl-2.0-plus AND gpl-2.0" in v30

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, I think all of this cases are bugs/regressions in ScanCode, right? Meaning, our code is behaving as expected here, just producing unexpected/wrong results based on the underlying raw data 🤔

@pombredanne
Copy link
Member

@elrayle

A couple comments:

  1. About Add LicenseRef support #1148 (comment) The latest ScanCode version has SPDX license expressions side-by-side with the ScanCode license expression. This cannot be guessed from the ScanCode license keys alone, this is a mapping that is explicitly kept in the license database at https://scancode-licensedb.aboutcode.org/ and at https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses

  2. About Add LicenseRef support #1148 (comment) I see this is likely for a wheel or tar of pkg:pypi/pylint@3.2.3. For a top level declared package, I would use the --package (and I would always use also these together with --license : --license-text-diagnostics --license-text --license-diagnostics to get the correct details. Note that since version 30.1.0 there have been quite a few changes aboutcode-org/scancode-toolkit@v30.1.0...develop Showing 83,666 changed files with 1,730,392 additions and 515,652 deletions. which is rather big. There have been major improvements to the license detection since. There could be bugs there too, but from a quick look, the correct declared license is GPL and there is other files under CC-BY-SA (two doc icons). The pylint project has otherwise a poor license clarity for its wheel and a much better one for its source distribution. See attached PDFs (using ScanCode.io).
    ScanCode.io_ 3fa2a1f68d.f95201a764.ce71b4612d.pdf
    ScanCode.io_ 258179e40f.f95201a764.ce71b4612d.pdf

  3. About Add LicenseRef support #1148 (comment) ... what are the download URLs? Based on a scan of one of the wheel of pkg:pypi/pillow@9.5.0 this is detected as an HPND/historical license alright as declared, but there are other licenses involved too. Pillow did change the text of their license and the name they use to refer to it too. See for instance python-pillow/Pillow@997932b ... AFAIRC older versions of PIL and Pillow were using LicenseRef-scancode-secret-labs-2011 https://scancode-licensedb.aboutcode.org/secret-labs-2011.html until they clarified this as being a plain HPND. It may not have not met exactly the SPDX matching guidelines by then.

Also, do you mind filing issues ScanCode Toolkit? This will not be tracked otherwise!

lib/utils.js Outdated Show resolved Hide resolved
@qtomlinson
Copy link
Collaborator

Test failures are related to normalizeLicenseExpression, which depends on the update and release of @clearlydefined/spdx. Its release in turns depends on the merge of the following PRs:

  1. Add licenseRefLookup spdx-expression-parse.js#9
  2. Support LicenseRef spdx#34

Copy link
Collaborator

@elrayle elrayle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I added a test suggestion that will catch a known issue with the scancode map generation workflow.

test/lib/util.js Show resolved Hide resolved
qtomlinson and others added 2 commits October 25, 2024 11:27
Co-authored-by: E. Lynette Rayle <elrayle@users.noreply.github.com>
@qtomlinson qtomlinson merged commit a11d630 into master Oct 25, 2024
4 checks passed
@qtomlinson qtomlinson deleted the add-licenseref-support branch October 25, 2024 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants