Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track repository_url #495

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

prabhu
Copy link

@prabhu prabhu commented May 2, 2024

Supersedes #494 by also setting component evidence for 1.5 spec.

With this PR, purl for the components would include repository_url qualifier for repositories that is not https://repo.maven.apache.org/maven2 as per purl spec.

Example BOM generated for the repo https://github.com/eclipse-jkube/jkube is attached.

bom.xml.txt
bom.json.txt

Things to clarify

package-url/purl-spec#303

prabhu added 4 commits May 2, 2024 00:09
Signed-off-by: Prabhu Subramanian <prabhu@appthreat.com>
Signed-off-by: Prabhu Subramanian <prabhu@appthreat.com>
Signed-off-by: Prabhu Subramanian <prabhu@appthreat.com>
Signed-off-by: Prabhu Subramanian <prabhu@appthreat.com>
@prabhu prabhu marked this pull request as draft May 3, 2024 08:05
@prabhu
Copy link
Author

prabhu commented May 3, 2024

Making this a draft to try another approach. Please feel free to use the PR for testing purposes.

Update: Ready for review.

… up confidence

Signed-off-by: Prabhu Subramanian <prabhu@appthreat.com>
@prabhu prabhu marked this pull request as ready for review May 3, 2024 10:34
@prabhu
Copy link
Author

prabhu commented May 11, 2024

@hboutemy @stevespringett Would appreciate a review.

@hboutemy
Copy link
Contributor

IIUC, repository_url is an origin repository url for a dependency?

from Maven experience on such feature for dependencies reports, if a repository manager is used, it's not possible to identify the origin repository of a dependency: that's why we removed the feature a few years ago

I don't think this is feasible in a consistent way, that's a general Maven issue, shared by every plugin that tried to do that

@prabhu
Copy link
Author

prabhu commented May 15, 2024

@hboutemy Thank you for the comments. If a repository manager is used and if we get the url to the manager instance that is still a useful information for enterprise customers. Currently, there is no visibility about either the repository manager or the origin.

@hboutemy
Copy link
Contributor

I just looked more at the example SBOM given, here is an example component with the new data that I see

    {
      "publisher" : "Red Hat",
      "group" : "io.fabric8",
      "name" : "kubernetes-model-common",
      "version" : "6.12.1",
      "description" : "Java client for Kubernetes and OpenShift",
      "scope" : "required",
      "hashes" : [
... usual hashes ...
      ],
      "licenses" : [
... usual licenses ...
      ],
      "purl" : "pkg:maven/io.fabric8/kubernetes-model-common@6.12.1?type=jar&repository_url=https%3A%2F%2Frepo1.maven.org%2Fmaven2%2F",
      "externalReferences" : [
... usual external references ...
      ],
      "evidence" : {
        "identity" : {
          "field" : "purl",
          "confidence" : 1.0,
          "methods" : [
            {
              "technique" : "hash-comparison",
              "confidence" : 0.9,
              "value" : "7b3cb18e7a6d5c53be9f9e5ab1128409793e06fc"
            },
            {
              "technique" : "filename",
              "confidence" : 0.1,
              "value" : ".m2/repository/io/fabric8/kubernetes-model-common/6.12.1/kubernetes-model-common-6.12.1.jar"
            }
          ]
        }
      },
      "type" : "library",
      "bom-ref" : "pkg:maven/io.fabric8/kubernetes-model-common@6.12.1?type=jar"
    },

I see 2 parts:

  1. the new purl repository_url parameter addition &repository_url=https%3A%2F%2Frepo1.maven.org%2Fmaven2%2F for some dependencies but not all: I don't get why some dependencies have the field, and some don't have it?
  2. the evidence addition: IIUC, it is completely independent from this repository_url aspect: IMHO, every piece of data for this one requires to be discussed separately (I won't start here, as it would make the discussion too complex)

let's discuss the repository_url aspect: when is it added or not? do we have examples of cases where the value is more useful than the value of Maven Central url?

@prabhu
Copy link
Author

prabhu commented May 16, 2024

Thanks @hboutemy.

The repository_url is added when maven resolves and pulls the jar from a remote repository and not from a local cache. There is an if condition to ensure this is the case.

if (repository instanceof RemoteRepository) {

When executing this PR branch with a custom repository proxy and maven repo directory using MAVEN_OPTS, nearly all the components receive the repository_url. I say nearly, because while building the plugin some dependencies are downloaded, which are getting resolved from the local cache when testing with jkube (many gradle libraries). The new bom files are attached.

export MAVEN_OPTS="-Dmaven.repo.local=/tmp/m2"

bom.json.txt
bom.xml.txt

@prabhu
Copy link
Author

prabhu commented May 30, 2024

@hboutemy any feedback based on my last comment?

@hboutemy
Copy link
Contributor

hboutemy commented Jun 3, 2024

as you can see from your second example generated SBOM, the resulting repository_url is the url of a proxy = something unusable (I'm even surprised by the IP in http://0.0.0.0:8080/releases value)

I understand the dream of this PR, but this is a dream: Maven cannot do that reliably
I don't want to add that code that will generate unreliable noisy data, particularly in the highly visible purl field

Thinking about it, if you want to work on this aspect, see #245, I'd propose to focus on the distribution-intake external reference: from intake of a dependency, defining the distribution external reference of that dependency can make sense

and please, when sharing example, please share first a simple example before sharing complex ones: simple is to define precisely the feature, while complex is useful to see at scale

@prabhu
Copy link
Author

prabhu commented Jun 3, 2024

@hboutemy I disagree with the word unusable. It is showing all the packages that got downloaded from a private internal repository (happens to run on localhost). It gives the confidence that there was no local cache (could be malicious) that was used. From the administrative settings of the internal registry, we can find the upstream public repository that was used for caching and it is absolutely fine if the SBOM tool doesn't have this information.

The second part of the PR sets identity evidence, which IMHO, is quite important for any SBOM tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants