-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scripts: zephyr_module: Add CPE support #66495
scripts: zephyr_module: Add CPE support #66495
Conversation
doc/develop/modules.rst
Outdated
|
||
.. code-block:: yaml | ||
|
||
vulnerabilities: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure vulnerabilities
is the best key at this level.
It can give the impression that it will contain information regarding actual vulnerabilities, which is not the case.
You already propose a cpe
entry, so why can we not provide the remote
as a sub-field under there ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An evolution to vulnerabilities
would be to add the field vex
which would point to a vex file allowing to filter CVE.
Do you think it would fit in cpe
field or should we find an other name (not sure what it would be though) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for sharing some thoughts for future direction of this.
With a vex
(or vdr) field, then vulnerabilities starts to make more sense as name.
I don't have a better suggestion for another name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about simplysecurity:
? security_info:
?
I do agree with @tejlmand that "vulnerabilities" makes it sound like it lists vulnerabilities or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like security_info more.
Broad enough to cover the vex info, while still specific enough to be useful for readers to understand the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this addition.
Notice some minor parts to adjust before +1, namely version
and first time use of abbreviations.
doc/develop/modules.rst
Outdated
@@ -569,6 +569,36 @@ Build files located in a ``MODULE_EXT_ROOT`` can be described as: | |||
This allows control of the build inclusion to be described externally to the | |||
Zephyr module. | |||
|
|||
SBOM integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we turn this around.
Instead of having SBOM integration
, then describe what content is (example: Vulnerability monitoring
), and then instead describe that this information will be included in the sbom.
The vulnerability monitoring / information can be of interest, even without an SBOM.
(Just a thought, not a request for a change)
doc/develop/modules.rst
Outdated
The module description file :file:`zephyr/module.yml` can be used to improve the | ||
generated SBOM and vulnerability monitoring. | ||
|
||
The version and the CPE associated to the module can be provided using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when using abbreviations like: CPE
and CVE
(and soon VEX
), we should write them in full once, so that even users unfamiliar with the terms know what they mean.
doc/develop/modules.rst
Outdated
|
||
.. code-block:: yaml | ||
|
||
version: <module-version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move the version
out of SBOM integration
.
version
has value on its own, and should be considered first class citizen in the module yaml.
Just like name
is independent: https://docs.zephyrproject.org/latest/develop/modules.html#module-name
c8f98bd
to
b1ad1b6
Compare
@tejlmand I've taken into account your review. I also added the integration of CPE and version in the SBOM since documentation talk about the SBOM. I wonder if we should have some testing on the output SBOM ? For example we could generate it for each project built in the pipeline and then run pyspdxtools to validate them. That would be a start. What do you think ? I'm not familiar with zephyr pipeline, could you point me to the correct files if you think testing is a requirement for this. Thanks |
39be3c1
to
8fdd25f
Compare
I wonder if I should add Maybe I could rename I would keep What do you think @tejlmand ? |
d35e433
to
9512f31
Compare
I've added a new commit with the modification from my previous comment. If it's okay with you, I can squash them at the end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
So minor doc suggestions but nothing blocking.
doc/develop/modules.rst
Outdated
Module version | ||
============== | ||
|
||
Each Zephyr module should have a version number to improve tracking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should
seems a bit strong in this case, especially as it would indicate that modules are having some form of release cycles. This may be true for some modules, but other modules revision might be considered to be identified (thus indirectly follow the Zephyr version) through the SHA in the manifest, which again is identified by the Zephyr revision / version.
Each Zephyr module should have a version number to improve tracking. | |
Each Zephyr module may have a version number to improve tracking. |
|
||
The module description file :file:`zephyr/module.yml` can be used for vulnerability monitoring | ||
and improve the generated SBOM output. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good.
Maybe at this location specify the version:
field should be used.
For example:
When a module monitors vulnerabilities then the module should also define the `version` field, :ref:`module_version`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your comment makes me think this version
field creates confusion. There was a misunderstanding from my part in the original issue.
The goal was to provide a version for the module and help complete the SBOM for the module (and not searching vulnerabilities from the forked repository for example). The external-references
field must be used for that (and CPE and PURL specifications allow to include the version)
I think we should remove the version
field from this PR and I will clarify this for a next PR. Are you okay with this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that there are some clarification (raised by your comment) to be done on this PR with the security group. In the meantime, I converted it into draft and I'll update it when I have all the information I need.
I'm sorry for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you believe it clarifies the usage and intentions of the field, then it's fine.
The clearer the intention of a PR, the better.
I tried
in mbedTLS module and I couldn't see the cpe in the output, if I do:
I can see the cpe, but my understanding is that we should see cpe for downstream and upstream packages right ? Though we will probably be using (at least, in beginning) the upstream CPE. |
9512f31
to
391ab83
Compare
@ceolin, the PR was not updated yet, I was waiting for your approval before updating it. You can now try it. To clarify, @tejlmand I think you can use #53479 (comment) for a better understanding of the changes. |
0e5326c
to
b25a00c
Compare
@tgagneret-embedded Thank you for your patience. It's "unfortunate" you just pushed a new commit as this dismissed the existing reviews and will only contribute to merge taking even longer, as this now needs another (hopefully much faster though) round of reviews. That being said, it doesn't justify why things have been taking so long so I hope we'll get this merged soon. @tejlmand, please do try to have a look at this PR again as it's been several months since it's been waiting for your approval as the assignee. |
if len(revision) == 0 or url.startswith("git://"): | ||
return url | ||
|
||
url = url.replace("https://", "git+https://") | ||
url = url.replace("http://", "git+http://") | ||
url = "@".join([url, revision]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure this handles properly the case where url already starts with "git+http"? Shouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URL si extracted from meta file. The URL is the one for cloning the repository. I don't think it can have the url already starting with "git+".
This function is not generic and takes into account that this script is run in zephyr: only git is supported, and url is parsed from git remote
command (and maybe using west list
values later)
So, do you think I should add a special case for the URL that starts with "git+" (even though I don't think it can happen in our case) ?
Maybe you see a special case I did not see.
@tgagneret-embedded #70581 was just merged causing the need for this PR to be rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better. Still a couple of observations.
As for the last two commits, can you please squash those into the appropriate parent commits ?
When reviewing commit-by-commit, then it's not nice to suddenly see code that one comment suddenly being changed in later commit in same PR.
def _normalize_module_name(self, module_name): | ||
# Replace "_" by "-" since it's not allowed in spdx ID | ||
return module_name.replace("_", "-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the normalized path goes into the cfgPackageZephyrModule.
I understand that if _
is not allowed directly in the spdx IP, then we must normalize, but then I as minimum think we should have a field which provides the module name without normalization, so that tools can process this information.
The problem with _
-> -
, is that you cannot go back once the normalization has been performed. (some modules has -
in their name).
@@ -216,39 +249,67 @@ def setupZephyrDocument(self, modules): | |||
cfgPackageZephyr.spdxID = "SPDXRef-zephyr-sources" | |||
cfgPackageZephyr.relativeBaseDir = relativeBaseDir | |||
|
|||
zephyr_url = zephyr.get("remote", "") | |||
if zephyr_url: | |||
cfgPackageZephyr.url = zephyr_url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if looking at the examples given in 7.7.3, then it states:
For Git:
SPDX supported schemes are: git, git+git, git+https, git+http, and git+ssh. git and git+git are equivalent.
Here are the supported forms:
PackageDownloadLocation: git://git.myproject.org/MyProject
PackageDownloadLocation: git+https://git.myproject.org/MyProject.git
PackageDownloadLocation: git+http://git.myproject.org/MyProject
...
so http://
or https://
without git, doesn't seem to be sufficient.
remotes_name = stdout.rstrip().split('\n') | ||
|
||
remote_url = None | ||
if len(remotes_name) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe the approach of looking for a single remote is correct.
west
actually uses the url for fetching (although it does create a remote for user convenience if none exists).
This means that the url used by west update
may be different than the url described by a single remote.
Thus, a much safer approach is to use the url returned by west list
when the Zephyr module is also a west project.
Note, a Zephyr module doesn't have to be a west project, also it doesn't even need to be a git repo.
So those cases must also be properly handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so http:// or https:// without git, doesn't seem to be sufficient.
The URL is transformed in writer.py
because it's related to SPDX.
Just like you said that -
should be replaced in writer.py
and not in walker.py
.
Note, a Zephyr module doesn't have to be a west project, also it doesn't even need to be a git repo.
So those cases must also be properly handled.
I agree, but before my PR, only git repository was supported. The only way to get the revision was through the git_revision
function. So even if I agree with you, I would prefer to keep the current state of support and extend it later if possible.
I will have a look at how to get the same data returned by west list
, however I would need to know how I should handle the case where one or more remotes are added manually to a module.
It echoes a previous comment but:
What if a repository is checked out to a commit presents in a different remote than the one provided by west list
. What if the commit is in multiple remotes (and not necessarily in the one listed in west list
) ? Which remote should I choose ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if a repository is checked out to a commit presents in a different remote than the one provided by west list.
That is why the build output meta appends -off
to the commit when the sha used is different from the sha defined in the manifest.
https://docs.zephyrproject.org/latest/kconfig.html#CONFIG_BUILD_OUTPUT_META
Even with your proposal of relying on a single remote will provide wrong information in case a user does a local commit, because in such a case the sha is not available at the remote.
Also users can do a git fetch <url>
followed by a git checkout FETCH_HEAD
.
In such a case there may be a single remote, but a commit from different url is checked out locally.
Such workflow is very common for testing PRs or branches.
What if the commit is in multiple remotes (and not necessarily in the one listed in west list) ? Which remote should I choose ?
When west
is used, then use the west remote
if the sha is identical to the sha defined by the current manifest.
Because in that case we know the sha can be reached by that remote.
If it's not identical, then consider the sha -off
. No reason to re-invent what CONFIG_BUILD_OUTPUT_META
already provides. (Have you looked at this feature at all ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will use the -off
. Thanks.
So, if I understand correctly, even though there is multiple remotes, I will use the remote returned by west list
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tejlmand I don't think there is any remote provided by west for the manifest repository.
If running west init -m https://github.com/zephyrproject-rtos/zephyr.git
, west list
does not provide any remote for the manifest. I didn't find anything in west code to have this information (command or function), you can only extract the manifest directory. The only solution I have is to run git remote
in the manifest directory to get it.
So if I have two remotes (for example, zephyr remotes and the forked zephyr to add my feature), I don't have any clue on which remotes I should use.
What should I do in this case ? No remote ? The first one ?
Thanks.
scripts/zephyr_module.py
Outdated
|
||
revision, dirty = git_revision(path) | ||
workspace_dirty |= dirty | ||
remote, dirty = git_remote(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that a revision can be dirty, but what does dirty mean in context of a remote ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have multiple remotes in one module, how you make sure the current commit is from one remote or an other ?
One solution could be to usegit branch -a --contains <commit id>
, but I'm not sure it's reliable to determine if it's one commit is from a specific remote.
What happens if the commit is present in multiple remote ? Which one should I choose ?
So if I detect multiple remote, I mark the repository has "dirty" and no remote is selected.
How do you think we should handle this case (where remotes are added manually) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have multiple remotes in one module, how you make sure the current commit is from one remote or an other ?
You can't. A given commit SHA is unique and can exists in multiple repositories, that's the nature of git.
But that doesn't make the remote "dirty".
In fact, you should not rely on the remote at all for this.
If you want to rely on something, then you should the remote from the west
project, if the module is also a west project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't.
I was thinking of a fork. So it's possible :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't.
I was thinking of a fork. So it's possible :)
Not necessarily, and especially not with github
.
If forking a repo using github
's fork feature, then you can actully fetch SHAs directly regardless of the fork to which it was pushed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, I misread your first message. I completely agree with you.
doc/develop/modules.rst
Outdated
|
||
.. code-block:: yaml | ||
|
||
vulnerabilities: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like security_info more.
Broad enough to cover the vex info, while still specific enough to be useful for readers to understand the context.
re: @tejlmand's comment below (which I can't reply to as it's "outdated)
It's now |
On top of this question, should we use Maybe For example, when we use an external repository for the manifest, this manifest won't appear in any SPDX/SBOM. So where should it appears ? In a broader sense, if someone add new projects in manifest, it's not listed in any SPDX/SBOM. By the way it seems that all modules are listed as a project. I don't know if it's possible to have a module not listed as a project ? I think the creation of a new SPDX Thanks |
afc19fe
to
b145252
Compare
|
Improve the SPDX with the current values: - URL: extracted from `git remote`. If more than one remote, URL is not set. - Version: extracted from `git rev-parse` (commit id). - PURL and CPE for Zephyr: generated from URL and version. For zephyr, the tag is extracted, if present, and replace the commit id for the version field. Since official modules does not have tags, tags are not yet extracted for modules. To track vulnerabilities from modules dependencies, a new SBOM, `modules-deps.spdx` was created. It contains the `external-references` provided by the modules. It allows to easily track vulnerabilities from these external dependencies. Signed-off-by: Thomas Gagneret <thomas.gagneret@hexploy.com>
Since `writer.py` is the one writting the SPDX file, it should normalize the name field and not `walker.py` which generates the SBOM components. Signed-off-by: Thomas Gagneret <thomas.gagneret@hexploy.com>
b145252
to
9799c08
Compare
@tejlmand rebase done except for the last commit (after rebase: 9799c08). Let me know what I should do with it. |
@tejlmand Can you tell me if there anything missing from my part for you to review this PR ? If so, let me know, I would be happy to fix it. This PR is opened for a long time now 😄. Thanks |
ping @tejlmand |
@tgagneret-embedded although it's getting late in the release cycle for doing this, would it make sense to submit updates to some of the manifests of the modules where this new feature would be useful (i.e CPE/PURLs actually exist)? mbedTLS certainly comes to mind. Thanks! |
I was thinking of this but I wasn't sure it's the best time. I'm afraid it would break some build. If we add CPE information in mbedTLS module for example and it's used with older version of Zephyr, an error will occurs when parsing the I don't know how modules are handled, and if my use case is possible. If you think it's safe, I can update some |
|
This allows the possibility of:
cpe
andversion
fields are extracted from module.yml and included in generated kernel meta file (making it available for SBOM generation).This PR is related to issue #53479 and follows PR #66182 that aims to improve the SBOM output and better monitoring of vulnerabilities.