Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

License correction and on-the-fly codegen #1942

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

LecrisUT
Copy link

Motivation

This PR tries to address licensing issue in the Cargo.toml metadata. Both the source files and the generated files contain Apache-2.0 licensed files, and as such the Cargo.toml file should reflect that. This is not a issue because prost is already licensed as Apache-2.0 and as such any dependency would have to comply with MIT AND Apache-2.0 implicitly.

A more nuanced file are the .bin files which further includes BSD-3-Clause metadata. To me it is unclear what the license under Cargo.toml is meant to reflect and if it would include the generated artifacts at the end as well, and it is also ambiguous who should be carrying the additional BSD-3-Clause license information (tonic, prost or end-user?). At the very least including the .bin files in VCS and crates is problematic.

Solution

  • Add the Apache-2.0 license file and metadata
  • Migrate the codegen to build.rs files

Unfortunately I am quite a novice on writing rust code, so there is quite a duplication of the build.rs files, but I hope you can alter it to be more appropriate.

Closes #1939

PS: the Apache-2.0 copyright should be updated to include tonic as the copyright holder, unless it is not meant to be altered from upstream.

@djc
Copy link
Contributor

djc commented Sep 16, 2024

This PR tries to address licensing issue in the Cargo.toml metadata. Both the source files and the generated files contain Apache-2.0 licensed files,

How did you come to this conclusion? Also it's not clear to me why the Apache-2.0 licensing needs to affect all the crates that you've added here -- presumably that's not just because they depend on another crate that has Apache-2.0-licensed contents?

I think we should keep the generated files versioned -- build scripts impose a slowdown/extra compile time on all downstream users, and in this case all the inputs are known at development time anyway.

@LecrisUT
Copy link
Author

This PR tries to address licensing issue in the Cargo.toml metadata. Both the source files and the generated files contain Apache-2.0 licensed files,

How did you come to this conclusion? Also it's not clear to me why the Apache-2.0 licensing needs to affect all the crates that you've added here -- presumably that's not just because they depend on another crate that has Apache-2.0-licensed contents?

Because the source files themselves have a Apache-2.0 copyright notice:

// Copyright 2020 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

And we have investigated the .bin files where we found both BSD-3-Clause license metadata and 2 Apache-2.0 license metadata that seem to be from the .proto files.

I think we should keep the generated files versioned -- build scripts impose a slowdown/extra compile time on all downstream users, and in this case all the inputs are known at development time anyway.

At the same time, when I run the original codegen with tonic 0.12.2 I have unstaged diffs compared, so there are potential differences that people can encounter. If you would like to have it under VCS as a cached version, that could also work as long as the build script is available in the crate. But then you might need to review the .bin files if newly licensed files are included or if non-free files are being used. When Fedora regenerates the code it is on our duty to do this review instead.

@djc
Copy link
Contributor

djc commented Sep 16, 2024

At the same time, when I run the original codegen with tonic 0.12.2 I have unstaged diffs compared, so there are potential differences that people can encounter. If you would like to have it under VCS as a cached version, that could also work as long as the build script is available in the crate. But then you might need to review the .bin files if newly licensed files are included or if non-free files are being used. When Fedora regenerates the code it is on our duty to do this review instead.

I think the unstaged diffs happen because you have are using a different version of prost. If you want to make changes to include the codegen code in the crate, that seems okay.

@LecrisUT
Copy link
Author

I think the unstaged diffs happen because you have are using a different version of prost

It seems more related to protoc version and the libraries it picks up from there

$ protoc --version
libprotoc 3.19.6

When I run codegen against master the same unstaged diffs appear and checking the CI, prost versions are identical. To me this seems dubious, but I am not familiar with these projects to know if using artifacts from other protoc is supported or not.

@decathorpe
Copy link

This PR tries to address licensing issue in the Cargo.toml metadata. Both the source files and the generated files contain Apache-2.0 licensed files,

How did you come to this conclusion? Also it's not clear to me why the Apache-2.0 licensing needs to affect all the crates that you've added here -- presumably that's not just because they depend on another crate that has Apache-2.0-licensed contents?

The tonic-health, tonic-reflection, and tonic-types crates all contain .proto files that are explicitly Apache-2.0 licensed, but they currently don't contain the Apache-2.0 license text (which is a requirement to satisfy the terms of this license), and don't mention the Apache-2.0 license in their crate metadata.

The fact that this change was applied to tonic-interop too is probably a mistake.

I think we should keep the generated files versioned -- build scripts impose a slowdown/extra compile time on all downstream users, and in this case all the inputs are known at development time anyway.

The problem is that we cannot ship binary blobs like this in packages in Fedora Linux, especially if they're not reproducible - both because it violates rules we have for packaging binary artifacts, and because it's a possible xz-utils-like security risk.

It would be OK for us if the regeneration of the *.bin files was introduced behind and off-by-default feature flag, but I don't think build.rs can be made entirely optional, depending on a feature flag. If this is not something that you want to do in tonic upstream, then we'll have to carry this as a downstream patch in Fedora.

@LecrisUT
Copy link
Author

LecrisUT commented Sep 16, 2024

The fact that this change was applied to tonic-interop too is probably a mistake.

interop, examples and codegen are non-public, so it doesn't matter much. Thought I would just be completionist since it does have .proto files with Apache-2.0 header in the test files. Similarly with tonic crate and benches-disabled/proto/helloworld/helloworld.proto. I am still confused on what license is supposed to represent, source or artifact? Would be nice if they had the new Fedora distinction of SourceLicense vs License.

If this is not something that you want to do in tonic upstream, then we'll have to carry this as a downstream patch in Fedora.

Would we have to "sanitize" the crate before uploading in this case, or would deleting in the spec file be sufficient?

@decathorpe
Copy link

If this is not something that you want to do in tonic upstream, then we'll have to carry this as a downstream patch in Fedora.

Would we have to "sanitize" the crate before uploading in this case, or would deleting in the spec file be sufficient?

Doing deletion of pre-compiled artifacts is OK to be done in packaging, no need to make "cleaned" sources out-of-band.

@LecrisUT
Copy link
Author

Hmm, one design I've seen in blosc2-sys is to use features e.g.:

    #[cfg(feature = "regenerate-bindings")]

Maybe something similar is ok here?

@decathorpe
Copy link

If the approach with building the .bin files in build.rs is not an acceptable solution for you, can we at least get the license corrected to include AND Apache-2.0 and include the Apache-2.0 license texts?

I think that part should be uncontroversial, given that these crates obviously do ship files that are covered by the Apache-2.0 license.

@djc
Copy link
Contributor

djc commented Sep 19, 2024

If the approach with building the .bin files in build.rs is not an acceptable solution for you, can we at least get the license corrected to include AND Apache-2.0 and include the Apache-2.0 license texts?

I think that part should be uncontroversial, given that these crates obviously do ship files that are covered by the Apache-2.0 license.

Sure, happy to take a PR for that. We could probably also figure out a way to make the pre-generated code optional? Not sure how it would work with optional binary artifacts.

@LecrisUT
Copy link
Author

If the approach with building the .bin files in build.rs is not an acceptable solution for you, can we at least get the license corrected to include AND Apache-2.0 and include the Apache-2.0 license texts?
I think that part should be uncontroversial, given that these crates obviously do ship files that are covered by the Apache-2.0 license.

Sure, happy to take a PR for that.

That's why I had it as separate commits ;). Cherry-picked it out in #1946

We could probably also figure out a way to make the pre-generated code optional? Not sure how it would work with optional binary artifacts.

What do you think about: https://github.com/milesgranger/blosc2-rs/blob/97a31e5a3b6161bae0468558913f346399999de8/blosc2-sys/build.rs#L132-L133

If the logic is inversed, as #[cfg(not(feature = "pre-built-binaries"))] than that would be even greater for us. In either case the artifacts would be written in the same location.

@djc
Copy link
Contributor

djc commented Sep 19, 2024

What do you think about: https://github.com/milesgranger/blosc2-rs/blob/97a31e5a3b6161bae0468558913f346399999de8/blosc2-sys/build.rs#L132-L133

This doesn't help because downstream still needs to block builds on build.rs compilation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.proto files are licensed under Apache-2.0
3 participants