Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss Temurin SBOM format and content #3013

Open
Tracked by #3952
andrew-m-leonard opened this issue Jun 30, 2022 · 22 comments
Open
Tracked by #3952

Discuss Temurin SBOM format and content #3013

andrew-m-leonard opened this issue Jun 30, 2022 · 22 comments
Assignees
Labels
enhancement Issues that enhance the code or documentation of the repo in any way reproducible-build Sbom issue relate to work of sbom secure-dev

Comments

@andrew-m-leonard
Copy link
Contributor

Temurin builds are now producing SBOM artifacts, eg. https://github.com/adoptium/temurin18-binaries/releases/download/jdk18u-2022-06-30-09-20-beta/OpenJDK18U-sbom_x64_linux_hotspot_2022-06-29-23-30.json

These are based upon the CycloneDX schema : https://cyclonedx.org/capabilities/

This issue is to discuss the future format and content.

@andrew-m-leonard andrew-m-leonard added enhancement Issues that enhance the code or documentation of the repo in any way reproducible-build labels Jun 30, 2022
@zdtsw
Copy link
Contributor

zdtsw commented Jun 30, 2022

Ref: #3011

@zdtsw
Copy link
Contributor

zdtsw commented Jul 4, 2022

related #2984

@zdtsw
Copy link
Contributor

zdtsw commented Jul 4, 2022

@steelhead31
Copy link
Contributor

I've now produced an outline spreadsheet of some of the more common Linux based distributions on docker/vm , with the content of relevant files, to help identify which detail(s) we'd like to capture in the SBOM ( Issue #3010 )

https://docs.google.com/spreadsheets/d/1a07f2QqfpmWW0EMShscsnNNDaP4TEJHy61owUW5atW0/edit#gid=0

@smlambert
Copy link
Contributor

re: #3013 (comment) - thanks @steelhead31, it is very helpful to see all of the variants laid out in the spreadsheet

@steelhead31
Copy link
Contributor

Looks like the solution will be to just change the O/S full ver to be a combination of the pretty name, and the kernel version ( which should be all relevant from a reproducibility and security perspective.. )

@zdtsw
Copy link
Contributor

zdtsw commented Jul 8, 2022

@zdtsw zdtsw added the Sbom issue relate to work of sbom label Jul 8, 2022
@zdtsw zdtsw self-assigned this Jul 11, 2022
@zdtsw
Copy link
Contributor

zdtsw commented Aug 17, 2022

I looked a bit how the others generate their sbom, feels like the work we are doing for getting strace information, which can be the part of components (each line of result.txt as one entry)

{
      "bom-ref": "/bin/cat",
      "type": "application",
      "name": "cat",
      "scope": "required",
      "hashes": [
        {
          "alg": "SHA-256",
          "content": "51f665b9ef41b64340585164d5f75142a268f84f9a2537cd205c2b59bcbefbea"
        }
      ],
    },
...

then each in the components has a ref to be defined as dependencies

 {
      "ref": "/bin/cat",
      "dependsOn": [
         "busybox-1.34.1-r7"
      ]
    },
...

the current content
OpenJDK18U-sbom_x64_linux_hotspot_18.0.2_9.txt
we have as metadata.tools["ALSA", "FreeTpye", "FreeMarker"] should be in components if is used, or not show it in the sbom
metadata.tools."Docker image SHA1" should be in metadata.properties if the build is running inside a container

@zdtsw
Copy link
Contributor

zdtsw commented Aug 18, 2022

Post on slack https://cyclonedx.slack.com/archives/CV062H2GH/p1660811431368709

in short: CDX ( short of cyclonedx) plan to include formulation which is defined in owasp in their 1.5 release
but it has not decided the date more like in the early/mid of 2023.
Formulation describes how components were built often including build system invocation and properties, SDK and compiler versions, compiler flags, and a comprehensive list of parallel and sequential steps that were taken to build, test, and deliver a component. Formulation and pedigree are complimentary concepts and are often combined and referred simply as pedigree
this feels like what exactly we are after.
the uncertainty is, if any new tools to generate formulation from CDX 1.5 can be directly used by us?
or we cannot wait for a year till 1.5 to include such because sbom will be used as input for reproducible ( in that case, we might become contributor to CDX? )

@andrew-m-leonard
Copy link
Contributor Author

CDX formulation sounds interesting. I would guess given very few others are formulating native C/C++ SBOM's we would probably be the contributor to that work... So probably in our best interest to progress with what we're researching currently and see where it goes in a view to CDX contribution.

@sxa
Copy link
Member

sxa commented Jun 20, 2023

Noting that zlib is a potential source of differences and we should look at what we do with those options as they are currently inconsistent between platforms and versions in terms of whether it uses bundled or system zlib. Should we override to be consistent?

Summary:

  • AIX: system on jdk11, bundled elsewhere (!) (Yes, even on same machine - test-2)
  • Windows: bundled
  • Mac: bundled on aarch64, system on x64 except 11 and 17 where it's system on both
  • Linux (inc.alpine): bundled on jdk8 - system elsewhere

@sxa
Copy link
Member

sxa commented Jun 21, 2023

The zlib topic was discussed by the PMC today, and there was general agreement to move to using bundled as a preference, although it was noted that there may have been a technical reason for the 11 and 17 releases on macos to be different from the others, so that is something to be aware of. We would likely switch over after the July set of releases (Although we could switch Linux to JDK21 now since that is not part of the July release).
Noting also that we would need to revisit dependencies in the docker and installer images to ensure we are not pulling in anything unnecessarily after this change

@sxa
Copy link
Member

sxa commented Dec 6, 2023

Similar to the zlib issue we are moving over to using bundled freetype: #freetype - we have started with JDK21+: #3504

@andrew-m-leonard
Copy link
Contributor Author

andrew-m-leonard commented Jul 8, 2024

The Temurin SBOM has grown over time in terms of format, properties and structure, it's got to a point where we need to consider "versioning" the current structure (#3848), and re-formatting it to fit with the more common and extended CycloneDX features: https://cyclonedx.org/guides/OWASP_CycloneDX-Authoritative-Guide-to-SBOM-en.pdf

Comments welcome please?

Current Temurin SBOM example: https://github.com/adoptium/temurin21-binaries/releases/download/jdk-21.0.3%2B9/OpenJDK21U-sbom_x64_linux_hotspot_21.0.3_9.json

@andrew-m-leonard
Copy link
Contributor Author

@andrew-m-leonard
Copy link
Contributor Author

Very interesting, reading this blog: https://fossa.com/blog/sbom-examples-explained/
Suggests an "SBOM" is sort of fixed, in that it is changed only when "bill of materials" change, and at that point the "version" field is incremented. ie.it states this "application" CONTAINS "components A,B,C"..
It does not mean you have a different SBOM for every individual build you do of your application.

Temurin has leveraged SBOM in a slightly different way, in that every individual build has a unique SBOM, although part of this is common for re-builds of the same "release/tag", we have leveraged a lot of "formulation" and "build tools" type dependencies for the purpose of supply chain definition. It is important we know "exactly" what tooling was used to build the given release, and the environment and dependency versions used. So for example a given release could be compiled with gcc 10.2, and another build of the same release built using gcc 11.3.

@andrew-m-leonard
Copy link
Contributor Author

andrew-m-leonard commented Jul 8, 2024

In terms of true build of materials, ie."what is it made up of" ? a build of OpenJDK is essentially this:

  • Compiled openjdk/jdkNN native source
  • statically linked glibc code (some openjdk libraries statically link the glibc C runtime)
  • ALSA library compiled in
  • compiler debuginfo

The "openjdk source" might be the "material" that is built into the binary, however, the build tooling (eg.gcc compiler version) and build environment sysroot headers&libraries make up what ACTUALLY ends up being generated as the binary content.

@andrew-m-leonard
Copy link
Contributor Author

A great source of SBOM info by SonaType: https://github.com/awesomeSBOM/awesome-sbom?tab=readme-ov-file

@MikeLaptev
Copy link

I've read through the conversation in the Slack (CycloneDX), I wonder for the comment in the conversation thread

But from first look, this appears like overly misusing properties, since many of the information could be moved to formulation section.

Does it mean that one of the things that can be done in terms of the format and content - extract part that describes tools what are used during assembly the Temurin release to MBOM? 🤔

Considering the quote from the CycloneDX reference guide (https://cyclonedx.org/guides/OWASP_CycloneDX-Authoritative-Guide-to-SBOM-en.pdf; page 62, Formulations):

Generally,
the formulation is externalized from the SBOM into a dedicated Manufacturing Bill of Materials (MBOM).
The SBOM references the MBOM that describes the environment, configuration, tools, and all other
considerations necessary to replicate a build with utmost precision. This capability allows other parties to
independently verify inputs and outputs from a build which can increase the software's assurance.

Formulation establishes relationships with components and services, each of which can be referenced in
a given formula through a series of workflows, tasks, and steps. As of this writing, the "Authoritative Guide
to MBOM" is being drafted. When complete, it will serve as a reference for effectively using formulation for
a wide variety of use cases.

@andrew-m-leonard
Copy link
Contributor Author

Yeah, we need to think if we want to split into SBOM->MBOM 🤔

@smlambert
Copy link
Contributor

Hmm, originally MBOM was for hardware and while there is a use case for separating the data of how something was built versus a components list, it is only needed when the 'manufacturing of the thing' is private / sensitive.
https://cyclonedx.org/capabilities/mbom/. This is not our use case, we actually want people to easily see how Temurin was built.

I have been looking at SBOMs generated by some Tekton pipelines I have been running and they include formulation section directly in the SBOM itself, show-sbom.log. I will try to find some different / better examples.

@andrew-m-leonard
Copy link
Contributor Author

Hmm, originally MBOM was for hardware and while there is a use case for separating the data of how something was built versus a components list, it is only needed when the 'manufacturing of the thing' is private / sensitive. https://cyclonedx.org/capabilities/mbom/. This is not our use case, we actually want people to easily see how Temurin was built.

I have been looking at SBOMs generated by some Tekton pipelines I have been running and they include formulation section directly in the SBOM itself, show-sbom.log. I will try to find some different / better examples.

This was making me nervous, so good spot on the stricter control design, which as you say is not our use case. So yes, lets keep our formulation within our SBOM.

@andrew-m-leonard andrew-m-leonard changed the title Discuss Temurin SBOM format and content EPIC: Discuss Temurin SBOM format and content Sep 24, 2024
@andrew-m-leonard andrew-m-leonard changed the title EPIC: Discuss Temurin SBOM format and content Discuss Temurin SBOM format and content Sep 24, 2024
@andrew-m-leonard andrew-m-leonard self-assigned this Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issues that enhance the code or documentation of the repo in any way reproducible-build Sbom issue relate to work of sbom secure-dev
Projects
Status: Todo
Development

No branches or pull requests

6 participants