Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LicenseRef support #1148

Merged
merged 11 commits into from
Oct 25, 2024
45 changes: 35 additions & 10 deletions lib/utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -416,24 +416,49 @@ function joinExpressions(expressions) {
return SPDX.normalize(joinedExpressionString)
}

function normalizeLicenseExpression(licenseExpression, logger) {
if (!licenseExpression) return null
function normalizeLicenseExpression(rawLicenseExpression, logger) {
if (!rawLicenseExpression) return null

const licenseVisitor = rawLicenseExpression => {
const mappedLicenseExpression = scancodeMap.get(rawLicenseExpression)
const licenseExpression = mappedLicenseExpression ? mappedLicenseExpression : rawLicenseExpression

return SPDX.normalizeSingle(licenseExpression)
const licenseVisitor = licenseExpression => {
return scancodeMap.get(licenseExpression) || SPDX.normalizeSingle(licenseExpression)
}

const parsed = SPDX.parse(licenseExpression, licenseVisitor)
const result = SPDX.stringify(parsed)
// parse() checks for LicenseRef- and other special types of expressions before calling the visitor
// therefore use the mapped license expression as an argument if it was found
const mappedLicenseExpression = scancodeMap.get(rawLicenseExpression)
const parsed = SPDX.parse(mappedLicenseExpression || rawLicenseExpression || '', licenseVisitor)

// normalize the parsed license expression will recursively normalize the parsed license expression
const normalizedParsed = _normalizeParsedLicenseExpression(parsed, logger)

if (result === 'NOASSERTION') logger.info(`ScanCode NOASSERTION from ${licenseExpression}`)
const result = SPDX.stringify(normalizedParsed)
if (result === 'NOASSERTION') logger.info(`ScanCode NOASSERTION from ${rawLicenseExpression}`)

return result
}

function _normalizeParsedLicenseExpression(parsedLicenseExpression, logger) {
qtomlinson marked this conversation as resolved.
Show resolved Hide resolved
if (parsedLicenseExpression.left) {
if (parsedLicenseExpression.left.hasOwnProperty('left')) {
parsedLicenseExpression.left = _normalizeParsedLicenseExpression(parsedLicenseExpression.left, logger)
} else if (parsedLicenseExpression.left.hasOwnProperty('noassertion')) {
const new_left = normalizeLicenseExpression(parsedLicenseExpression.left['noassertion'], logger)
if (new_left.toLowerCase() === 'noassertion') parsedLicenseExpression.right = { 'noassertion': new_left }
else parsedLicenseExpression.left = { license: new_left }
}
}
if (parsedLicenseExpression.right) {
if (parsedLicenseExpression.right.hasOwnProperty('left')) {
parsedLicenseExpression.right = _normalizeParsedLicenseExpression(parsedLicenseExpression.right, logger)
} else if (parsedLicenseExpression.right.hasOwnProperty('noassertion')) {
const new_right = normalizeLicenseExpression(parsedLicenseExpression.right['noassertion'], logger)
if (new_right.toLowerCase() === 'noassertion') parsedLicenseExpression.right = { 'noassertion': new_right }
else parsedLicenseExpression.right = { license: new_right }
}
}
return parsedLicenseExpression
}

function _normalizeVersion(version) {
if (version == '1') return '1.0.0' // version '1' is not semver valid see https://github.com/clearlydefined/crawler/issues/124
return semver.valid(version) ? version : null
Expand Down
42 changes: 42 additions & 0 deletions test/lib/util.js
Original file line number Diff line number Diff line change
Expand Up @@ -880,3 +880,45 @@ describe('Utils buildSourceUrl', () => {
expect(result).to.eq('https://pypi.org/project/zuul/3.3.0/')
})
})

describe('normalizeLicenseExpression', () => {
it('should normalize license', () => {
const expression = 'MIT AND GPL-3.0'
const result = utils.normalizeLicenseExpression(expression)
expect(result).to.eq('MIT AND GPL-3.0')
})
qtomlinson marked this conversation as resolved.
Show resolved Hide resolved
it('should normalize single licenseRef', () => {
const expression = 'afpl-9.0'
const result = utils.normalizeLicenseExpression(expression)
expect(result).to.eq('LicenseRef-scancode-afpl-9.0')
})
it('should normalize license and licenseRef', () => {
const expression = 'afl-1.1 AND afpl-9.0'
const result = utils.normalizeLicenseExpression(expression)
expect(result).to.eq('AFL-1.1 AND LicenseRef-scancode-afpl-9.0')
})
it('should normalize licenseRef and license', () => {
const expression = 'afpl-9.0 AND MIT'
const result = utils.normalizeLicenseExpression(expression)
expect(result).to.eq('LicenseRef-scancode-afpl-9.0 AND MIT')
})
it('should normalize licenseRef and licenseRef', () => {
const expression = 'afpl-9.0 AND activestate-community'
const result = utils.normalizeLicenseExpression(expression)
expect(result).to.eq('LicenseRef-scancode-afpl-9.0 AND LicenseRef-scancode-activestate-community')
})
it('should normalize licenseRef and licenseRef or licenseRef', () => {
const expression = 'afpl-9.0 AND activestate-community OR ac3filter'
const result = utils.normalizeLicenseExpression(expression)
expect(result).to.eq('LicenseRef-scancode-afpl-9.0 AND LicenseRef-scancode-activestate-community OR LicenseRef-scancode-ac3filter')
})
it('should normalize INVALID to NOASSERTION', () => {
const mockLogger = {
info: (message) => {
console.log(message);
}
}; const expression = 'INVALID'
const result = utils.normalizeLicenseExpression(expression, mockLogger)
expect(result).to.eq('NOASSERTION')
})
})
2 changes: 1 addition & 1 deletion test/providers/summary/scancode/new-summarizer.js
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ describe('ScanCodeNewSummarizer basic compatability', () => {
const coordinates = { type: 'pypi', provider: 'pypi' }
const harvestData = getHarvestData(scancodeVersion, 'pypi-complex-declared-license')
const result = summarizer.summarize(coordinates, harvestData)
assert.equal(result.licensed.declared, 'HPND')
assert.equal(result.licensed.declared, 'LicenseRef-scancode-secret-labs-2011')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test originally had a license HPND. And now it is just LicenseRef-scancode-secret-labs-2011. Where did the original license come from? I would have thought a change would end up something like HPND AND LicenseRef-scancode-secret-labs-2011.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secret-labs-2011 is the declared license according to the raw ScanCode results. Before this change, our logic fell back to the first package's declared license which is HPND.

I'm not sure which is the ultimately correct one but we need this change to surface the ScanCode result.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to be sure I am understanding logic fell back to the first package's declared license correctly. Looking at the fixture, I see a license under summary which seems like the correct license...

    "summary": {
      "declared_license_expression": "secret-labs-2011",

and farther down, I see the first package (transient dependency) has the HPND as its declared license...

    "packages": [
      {
        "type": "pypi",
        "namespace": null,
        "name": "Pillow",
        "version": "9.5.0",
        ...
        "declared_license_expression": "historical",
        "declared_license_expression_spdx": "HPND",

It would be interesting to understand if that is a correct interpretation of how HPND was identified as the license and why that approach was chosen. To me, that doesn't seem correct as that is the license for Pillow 9.5.0.

@qtomlinson any insights into this?

Copy link
Collaborator

@qtomlinson qtomlinson Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In v30 result (line 760-766) shows content.packages[0].declared_license as HPND

          "license_expression": "historical",
          "declared_license": {
            "license": "HPND",
            "classifiers": [
              "License :: OSI Approved :: Historical Permission Notice and Disclaimer (HPND)"
            ]
          },

Reading from content.packages[0].declared_license was the preferred way before deriving from files in v30 scancode results. So using v30 scancode, the license would be HPND.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pypi also shows "License as [OSI Approved :: Historical Permission Notice and Disclaimer (HPND)]"

Copy link
Collaborator

@qtomlinson qtomlinson Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed Pillow 9.5 was curated as HPND

Copy link
Collaborator

@qtomlinson qtomlinson Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secret-labs-2011 is the declared license according to the raw ScanCode results.

Just noticed another case where declared_license_expression (v32) seems to be different from what is declared from the package. Added here for documentation purposes.
32.3.0.json:

    "summary": {
      "declared_license_expression": "cc-by-4.0 AND cc-by-sa-4.0 AND gpl-2.0",
	  ...
    package[0]
        "declared_license_expression": "gpl-2.0-plus AND gpl-2.0",
        "declared_license_expression_spdx": "GPL-2.0-or-later AND GPL-2.0-only",
	...
    files:
    {
        "path": "pylint-3.2.3/LICENSE",
        "detected_license_expression": "gpl-2.0",
        "detected_license_expression_spdx": "GPL-2.0-only",

30.3.0.json:

    package[0]					
	"license_expression": "gpl-2.0-plus AND gpl-2.0",
	"declared_license": {
		"license": "GPL-2.0-or-later",
		"classifiers": [
			"License :: OSI Approved :: GNU General Public License v2 (GPLv2)"
		]
	},
	....
    files: 
    {
         "path": "pylint-3.2.3/LICENSE",
		"key": "gpl-2.0",

"cc-by-4.0 AND cc-by-sa-4.0 AND gpl-2.0" in v32 is different from "gpl-2.0-plus AND gpl-2.0" in v30

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, I think all of this cases are bugs/regressions in ScanCode, right? Meaning, our code is behaving as expected here, just producing unexpected/wrong results based on the underlying raw data 🤔

}
})

Expand Down
Loading