Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

verify-copyright disallows an update that makes years match file history for a new file #48

Closed
jameslamb opened this issue Aug 23, 2024 · 5 comments · Fixed by #52
Closed
Assignees
Labels
question Further information is requested

Comments

@jameslamb
Copy link
Member

jameslamb commented Aug 23, 2024

Description

See rapidsai/cudf#15483 (comment).

There, I observed a case where a totally new file (not moved, not modified) was initially committed in April 2024 with a copyright header like this:

# Copyright (c) 2023-2024, NVIDIA CORPORATION.

When I tried to change it to just 2024, verify-copyright refused to let me.

In file python/libcudf/libcudf/_version.py:1:17:
 # Copyright (c) 2024, NVIDIA CORPORATION.
warning: copyright is not out of date and should not be updated

In file python/libcudf/libcudf/_version.py:1:3:
-# Copyright (c) 2024, NVIDIA CORPORATION.
+# Copyright (c) 2023-2024, NVIDIA CORPORATION.
note: suggested fix applied

Relevant details:

Reproducible Example

This came up on a PR, but I can reproduce it without any forks or PRs involved, now that rapidsai/cudf#15483 has been merged.

git clone git@github.com:rapidsai/cudf.git
cd cudf
sed -i .bak 's/2023-2024/2024/g' ./python/libcudf/libcudf/_version.py
git add ./python/libcudf/libcudf/_version.py
pre-commit run verify-copyright

History from https://github.com/rapidsai/cudf/commits/branch-24.10/python/libcudf/libcudf/_version.py

image

Notes

I (@jameslamb) am actively investigating this. Just putting this up to track the work.

@jameslamb jameslamb added the bug Something isn't working label Aug 23, 2024
@jameslamb jameslamb self-assigned this Aug 23, 2024
@jameslamb jameslamb changed the title verify-copyright chooses the wrong starting year verify-copyright disallows an update that makes years match file history for a new file Aug 23, 2024
@KyleFromNVIDIA
Copy link
Contributor

It comes from here:

find_copies=True,
find_copies_harder=True,
find_renames=True,

We enable --find-copies, --find-copies-harder, and --find-renames when examining the diff. It must be taking _version.py from python/cudf/cudf/_version.py. This is generally intended behavior.

@KyleFromNVIDIA
Copy link
Contributor

I'd be open to making verify-copyright more explicit about why it wants 2023-2024 in an oddball situation like this. It could explicitly state that python/libcudf/libcudf/_version.py is being copied from python/cudf/cudf/_version.py.

@jameslamb
Copy link
Member Author

Oh wow, thank you! Ok yeah find_copies does seem like the mechanism at play here.

If I add any content to that file to make it arbitrarily different from the others in the repo, like a comment that says # libcudf...

sed -i .bak 's/2023-2024/2024/g' ./python/libcudf/libcudf/_version.py
echo "# libcudf" >> ./python/libcudf/libcudf/_version.py
git add ./python/libcudf/libcudf/_version.py
git diff branch-24.10
diff --git a/python/libcudf/libcudf/_version.py b/python/libcudf/libcudf/_version.py
index 7dd732b490..8e8df732b3 100644
--- a/python/libcudf/libcudf/_version.py
+++ b/python/libcudf/libcudf/_version.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2023-2024, NVIDIA CORPORATION.
+# Copyright (c) 2024, NVIDIA CORPORATION.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -31,3 +31,4 @@ except FileNotFoundError:
     __git_commit__ = ""

 __all__ = ["__git_commit__", "__version__"]
+# libcudf

... , the hook lets me change the year to just 2024.

pre-commit run verify-copyright
verify-copyright.........................................................Passed

However... it will also let me change the file and leave the years unchanged.

echo "# libcudf" >> ./python/libcudf/libcudf/_version.py
git add ./python/libcudf/libcudf/_version.py
git diff branch-24.10
diff --git a/python/libcudf/libcudf/_version.py b/python/libcudf/libcudf/_version.py
index 7dd732b490..3a29041009 100644
--- a/python/libcudf/libcudf/_version.py
+++ b/python/libcudf/libcudf/_version.py
@@ -31,3 +31,4 @@ except FileNotFoundError:
     __git_commit__ = ""

 __all__ = ["__git_commit__", "__version__"]
+# libcudf
pre-commit run verify-copyright
verify-copyright.........................................................Passed

That's a kind of interesting quirk of this. It means that for the same file and commit history, there are multiple different sets of years that the hook will accept as valid (in this case, both 2023-2024 and just 2024).

@KyleFromNVIDIA
Copy link
Contributor

there are multiple different sets of years that the hook will accept as valid (in this case, both 2023-2024 and just 2024).

This is also intentional. The hook doesn't know when the file was created (it may have history outside of the repo), only that it was introduced in 2024 and should have its latest year updated.

@jameslamb jameslamb added question Further information is requested and removed bug Something isn't working labels Aug 23, 2024
@jameslamb
Copy link
Member Author

jameslamb commented Aug 23, 2024

Ok, got it. I understand now how both could be justifiable year ranges for this file.

I'm convinced that this isn't a bug but really just subtle behavior that surprised me because I didn't know about the find_copies mechanism. I've changed the label on this issue.

I'd be open to making verify-copyright more explicit about why it wants 2023-2024

I think this would be helpful.

In this example, this file's content was intentionally identical to the content of other files in the repo, so sharing a range of copyright dates could make sense.

In other cases, two files might end up just being accidentally identical (I'm thinking like a dot file configuring some CI tool, or an __init__.py that just has an empty __all__ list). In those situations, it'd be helpful to know that I can force the hook to respect my determination that the file should have a particular copyright year.

With an error message similar to this:

warning: Copyright is not out of date and should not be updated. This may be because this file is identical to others in the repo, and therefore is assumed to share copyright history with them. If that is not the case, change this file's content in a way that makes it unique, and its copyright dates will be determined only by its own edit history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants