Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🪪 Improvements to spdx license validation #1634

Merged
merged 2 commits into from
Nov 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/curly-readers-compete.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'myst-frontmatter': patch
---

Improvements to spdx license validation
61 changes: 59 additions & 2 deletions packages/myst-frontmatter/src/licenses/licenses.yml
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,53 @@ cases:
content:
url: https://example.com
warnings: 1
- title: url string coerces to license from spdx list
raw:
license: https://creativecommons.org/licenses/by/4.0/
Copy link
Collaborator Author

@fwkoch fwkoch Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously (and currently), this test case is passing - i.e. setting license to a known SPDX url would validate to the correct SPDX license object.

normalized:
license:
content:
id: CC-BY-4.0
name: Creative Commons Attribution 4.0 International
url: https://creativecommons.org/licenses/by/4.0/
CC: true
free: true
- title: url coerces to license from spdx list
raw:
license:
url: https://creativecommons.org/licenses/by/4.0/
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this would fail - i.e. if the url was explicitly provided as license.url, it would not validate to the SPDX license.

Now, this coerces correctly to SPDX.

normalized:
license:
content:
id: CC-BY-4.0
name: Creative Commons Attribution 4.0 International
url: https://creativecommons.org/licenses/by/4.0/
CC: true
free: true
- title: url coerces to license from spdx list with http
raw:
license:
url: http://creativecommons.org/licenses/by/4.0/
normalized:
license:
content:
id: CC-BY-4.0
name: Creative Commons Attribution 4.0 International
url: https://creativecommons.org/licenses/by/4.0/
CC: true
free: true
- title: url coerces to license from spdx list without trailing slash
raw:
license:
url: https://creativecommons.org/licenses/by/4.0
normalized:
license:
content:
id: CC-BY-4.0
name: Creative Commons Attribution 4.0 International
url: https://creativecommons.org/licenses/by/4.0/
CC: true
free: true
- title: string with no spaces coerces to unknown id (and warns)
raw:
license: my-custom-license
Expand All @@ -288,7 +335,7 @@ cases:
content:
id: my-custom-license
warnings: 1
- title: Other short string coerces to note (and warns)
- title: Other long string coerces to note (and warns)
raw:
license: |
my custom license but this is really long, like, more than one hundred characters long.
Expand All @@ -300,11 +347,21 @@ cases:
my custom license but this is really long, like, more than one hundred characters long.
clearly if it is this long it cannot be a name but must be a note.
warnings: 1
- title: Other long string coerces to name (and warns)
- title: Other short string coerces to name (and warns)
raw:
license: my custom license
normalized:
license:
content:
name: my custom license
warnings: 1
- title: Long string with "mit" buried in it is not coerced to MIT license
raw:
license: |
This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The utility MyST uses to match license IDs with SPDX is way too generous. It would pick up on the mit in the middle of "permitted" here, and coerce this to the MIT license. 😱

normalized:
license:
content:
note: |
This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited
warnings: 1
21 changes: 16 additions & 5 deletions packages/myst-frontmatter/src/licenses/validators.ts
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,8 @@ export function validateLicense(input: any, opts: ValidationOptions): License |
if (typeof input === 'string') {
const value = validateString(input, opts);
if (value === undefined) return undefined;
const valueSpdx = correctLicense(value);
// Do not try to coerce long values into SPDX licenses
const valueSpdx = value.length < 15 ? correctLicense(value) : undefined;
if (URL_ID_LOOKUP[cleanUrl(value)]) {
input = { id: URL_ID_LOOKUP[cleanUrl(value)] };
} else if (isUrl(value)) {
Expand Down Expand Up @@ -144,10 +145,20 @@ export function validateLicense(input: any, opts: ValidationOptions): License |
}
output.id = idSpdx ?? id;
} else {
validationWarning(
`no license ID - using a SPDX license ID is recommended, see https://spdx.org/licenses/`,
opts,
);
if (value.url) {
const url = validateUrl(value.url, { property: '', messages: {} }) ?? '';
const idFromUrl = URL_ID_LOOKUP[cleanUrl(url)];
if (idFromUrl) {
output.id = idFromUrl;
value.url = ID_LICENSE_LOOKUP[idFromUrl].url;
}
}
if (!output.id) {
validationWarning(
`no license ID - using a SPDX license ID is recommended, see https://spdx.org/licenses/`,
opts,
);
}
}
const expected = output.id ? ID_LICENSE_LOOKUP[output.id] : undefined;
if (value.url != null) {
Expand Down
Loading