Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add structure for licensing, possibly validated using Reuse or a similar standard? #21

Open
ferdnyc opened this issue Oct 15, 2021 · 18 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@ferdnyc
Copy link

ferdnyc commented Oct 15, 2021

Interesting project, I think you've got a pretty good start here, but one thing that strikes me as lacking is a repo-wide view of license management.

Clearly establishing license terms is so important in open-source projects (both for the author(s) of the code, and for anyone who's considering making use of it in their own projects), yet it's often handled haphazardly as an afterthought — or, just as often, not handled at all until someone pesters the maintainer into acting out of frustration. Typically that kind of thing causes problems down the line for someone (probably not the originator of the... "bespoke" license in question).

Adopting a standard for licensing right out of the gate is a simpler, saner solution. And it can be as simple as committing to standard practices defined by something like the Reuse project. Just like most projects will commit to following semantic versioning practices, adopting reuse-verified license compliance requires nothing more than a commitment to put in the work.

@jenstroeger
Copy link
Owner

Thank you, @ferdnyc, that makes a lot of sense!

I wasn’t aware of the Reuse Project or its compliance tools, and after browsing through their website I’d be more than happy to integrate their tools (e.g. the pre-commit hook).

Would you like to open a PR?

@jenstroeger jenstroeger added enhancement New feature or request help wanted Extra attention is needed labels Oct 15, 2021
@ferdnyc
Copy link
Author

ferdnyc commented Oct 16, 2021

Thank you, @ferdnyc, that makes a lot of sense!

I wasn’t aware of the Reuse Project or its compliance tools,

Yeah, I just learned of it a month or two ago. But they seem to have a good and level-headed take on this stuff.

It can definitely seem a bit off and pointless at first, having to deal with adding license tags to things like a .gitignore file or a workflow for a GitHub bot. Occasionally the added text for licensing purposes is longer than the functional content of the file! For someone who just doesn't see the point in any of it, I'm sure the initial setup work to bring a repo compliant would feel like a bunch of tedious busywork.

(Though it's certainly a lot easier to start off that way, than to have to retroactively figure all this stuff out for existing code. ...Speaking most definitely from experience.)

But as I noted earlier, there are a lot of parallels to semantic versioning/releasing: It's not a thing being imposed on a project, it's an ideal that the project's developers may choose to commit to, if they see the benefits as outweighing any additional restrictions or effort involved.

and after browsing through their website I’d be more than happy to integrate their tools (e.g. the pre-commit hook).

A pre-commit hook is certainly an option, for developers who want to keep themselves honest that way. I've always been a bit wary of Git hooks because they don't always play well with everyone's preferred mode of interacting with git. (Some UIs and tools have been fully integrated with all aspects of the git model, they gracefully recover from rejected commit attempts and notify the user in a clear and actionable way. Others... not so much.)

More interesting, in my view, are the various CI tools they offer, their Marketplace GitHub Action being the most obviously useful in this context. A CI check that flags any licensing issues and blocks PR merges is a no-brainer for a project that wants to stay Reuse-compliant.

(Won't prevent project committers from potentially pushing code with issues, if they have direct commit access to a branch... but it would mean that PR checks start failing after they do, through no fault of the PR submitter. 🙄 )

Would you like to open a PR?

Sure, I can do that. What do I need to know about the current contents of the repo, license-wise? Are there any files from external sources that have to be treated specially, or is it all original work covered by the blanket MIT license?

@ferdnyc
Copy link
Author

ferdnyc commented Oct 16, 2021

(I know I saw sphinx-quickstartoutput in one file, from a quick scan. AIUI files generated that way — especially as it's an interactive tool — are considered fully original works; you're free to license it however you choose, so no issues there.)

@behnazh
Copy link
Collaborator

behnazh commented Oct 17, 2021

Adding a license checker to this template repo is indeed a great idea. I agree with @ferdnyc that adding the checker as a GitHub Actions workflow would be preferable. While the Reuse Project seems interesting, another action that could be considered is License Eye, which checks for valid license headers.

Another interesting aspect to consider is third-party license analysis for open source compliance and risk assessment. One tool in this space is BlackDuck that enforces a predefined policy, but I couldn't find any open source version of it in the GitHub Actions Marketplace.

We could alternatively consider this action for Python packages that goes through the dependencies transitively to check their licenses. Here is an example output for some of the dependencies (may not be direct) for the current repo:

astroid:2.8.3                       GNU Lesser General Public License v2 (LGPLv2)           WeakCopyleft                  
attrs:21.2.0                        MIT License                                             Permissive                    
bleach:4.1.0                        Apache Software License                                 Permissive                    
certifi:2021.10.8                   Mozilla Public License 2.0 (MPL 2.0)                    WeakCopyleft                  
cffi:1.15.0                         MIT License                                             Permissive                    
cfgv:3.3.1                          MIT License                                             Permissive                    
charset-normalizer:2.0.7            MIT License                                             Permissive                    
click:8.0.3                         BSD License                                             Permissive                    
flake8:3.9.2                        MIT License                                             Permissive                    
flake8-builtins:1.5.3               GNU General Public License v2 (GPLv2)                   StrongCopyleft         

@jenstroeger
Copy link
Owner

jenstroeger commented Oct 17, 2021

Thank you @behnazh the great feedback!

Looking at @pilosus’s pip license checker you mentioned, I wonder if would make sense to map to the SPDX License List, at which point the output would become machine-readable again, in line with the Reuse Project’s intentions.

@ferdnyc
Copy link
Author

ferdnyc commented Oct 17, 2021

@behnazh Interesting! Reuse and the license checkers you mentioned actually have completely different aims, I would think that they're broadly compatible (though, really, they wouldn't be touching any of the same things so maybe "complementary" is a better word.)

The focus with Reuse is on making your own code ready to be reused, by making sure all local files are tagged with a valid license identifier. It doesn't do anything with dependencies at all.

(Reuse also avoids trying to boil an entire package down to a single license, because as code accumulates and files are pulled in from outside sources, the fact is that a single repo can contain files under a mix of 2, 3, 5, who-knows-how-many different licenses. (If a font is bundled, that's often a different license. Bundled icons, same thing. If there are data files included with the unit tests, those very often don't have any clear licensing at all. To assume they're licensed under the same terms as the overall project is often just wishful thinking.

That being said, it would be an interesting extension to reuse, if there were a tool that could pull in dependency licenses and add that info to the local license metadata. Any additional license texts could even be added to the local LICENSES/ directory alongside the local ones.

...BTW, am I the only one who finds it odd that pip-license-checker is written in Clojure?

@ferdnyc
Copy link
Author

ferdnyc commented Oct 17, 2021

The one sticking point for me, with Reuse, is that by default it would want every repo file to contain a license tag and a copyright statement (just "Year Author", by default). But since this is a template project, it seems odd to pre-fill that data, since really the user creating a project from the template should be doing that with their info.

Obviously, the template can start in a failing state and that'll be a reminder to the user that they need to update the files to insert their copyright... but that's kind of a pain given the diversity of file types, some of which Reuse can't automatically recognize to insert the appropriate type of comment.

@pilosus
Copy link

pilosus commented Oct 18, 2021

...BTW, am I the only one who finds it odd that pip-license-checker is written in Clojure?

Sorry to interject, pip-license-checker author here. It's all started as a tiny personal project over the Xmas holidays to get my hands dirty with Clojure. Turned out as something of great value for the commercial projects I was working for later. Now it can be used not only for Python deps license names checks (e.g. MIT, GPL, etc.), but also for license types checks (e.g. Permissive, Copyleft, etc.) basically for any list of dependencies along with their license names. So it's not Python limited anymore. I personally use it for JS / Android / iOS projects deps, just like for Python deps. So, yes, I know that the name may sound confusing, especially for the Clojure project. But I don't want to change it, until at least release v1.0. I plan to use SPDX license ids internally to make license names detection more precise (esp. for exceptions like GPL classpath or multi-licensing cases) as well as for better matching license name -> license type.

@jenstroeger
Copy link
Owner

jenstroeger commented Oct 19, 2021

Thank you all for your comments, I really appreciate that! It looks to me like the original thread has split into three somewhat independent facets:

  1. Formalize and clearly state/implement license information for this repository, e.g. using the Reuse Project like @ferdnyc proposed initially.
  2. Check the licenses of all dependent packages, e.g. using @pilosus’s pip-license-checker or Black Duck like @behnazh suggests. In both cases, it might be useful if dependent licenses would be output as SPDX identifiers (see issue Use SPDX license identifiers pilosus/pip-license-checker#85) because… next point.
  3. Cross-reference the licenses of all dependent packages with the license of this package to check for conflicting uses.

I think point 1. was discussed above:

@ferdnyc Sure, I can do that. What do I need to know about the current contents of the repo, license-wise? Are there any files from external sources that have to be treated specially, or is it all original work covered by the blanket MIT license?

I think currently all files in the repo can be covered by the example blanket MIT license. But I also think it’d make sense to add a section to the README on how to move forward managing license information — wrt. all three points.

As for point 2. we could consider identify or probably add the license checker action and it looks like the checker’s fail option would address point 3. Although I wonder if the two could be knitted closer together 🤔

@jenstroeger
Copy link
Owner

In this context, PEP 639 is currently being drafted and discussed here: python/peps#2164 (see this initial forum thread that spawned the PEP).

I haven’t read all of it yet, though, but probably worth taking into consideration here.

@CAM-Gerlach
Copy link

CAM-Gerlach commented Dec 1, 2021

Thanks! Let us know if you have any questions or feedback. The scope of that PEP has been limited to project-wide license expressions as opposed to formally specifying per-file convention, though the standard SPDX-License-Expression header is referred to, but hopefully it will still be useful here.

As a general bit of feedback, for a template whose stated focus is on PEP 518 (which would tend to imply PEP 517), its a little surprising to see no build-system section in the pyproject.toml nor and all the build tool configuration and boilerplate code in setup.py instead of simpler, declarative configuration in setup.cfg.

Also, maybe worth considering cookiecutter (or a similar tool) to save a large amount of user effort and potential errors manually replacing everything, which would be particularly important when it comes to standardized license headers, copyright lines, license metadata, etc.

@jenstroeger
Copy link
Owner

@CAM-Gerlach
Copy link

FWIW, I hope to finally do what is hopefully the last big PEP update in the next couple days, after being busy with my research, work and other FOSS things.

@jenstroeger
Copy link
Owner

I opened PR #377 which uses the dependency-review-action which, in turn, has a license-check: true option.

@jenstroeger
Copy link
Owner

@acrobat888 suggested to look at LicenseCheck and pip-licenses as well.

@jenstroeger
Copy link
Owner

Somewhat related is the flake8-copyright plugin.

@jenstroeger
Copy link
Owner

And see also PEP 639 for related discussion.

@jenstroeger
Copy link
Owner

Another interesting package is license-expression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants