Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uuid: Add Universally Unique Identifier (UUID) utilities #77884

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

sorru94
Copy link
Contributor

@sorru94 sorru94 commented Sep 2, 2024

Add a basic implementation of UUID generating and manipulating functions.

Fixes: #77882

@zephyrbot zephyrbot added the area: Base OS Base OS Library (lib/os) label Sep 2, 2024
@jukkar
Copy link
Member

jukkar commented Sep 2, 2024

There are lot of uuid handling code in Bluetooth, should BT use this library?

@jukkar jukkar requested a review from jhedberg September 2, 2024 13:36
@sorru94
Copy link
Contributor Author

sorru94 commented Sep 2, 2024

There are lot of uuid handling code in Bluetooth, should BT use this library?

I'm unfamiliar regarding the Bluetooth use of the UUID. I see it uses 16 and 32 bits partial UUIDs, which are not defined in RFC 9562.
We could add a couple of defines similar to the ones in the bluetooth driver (BT_UUID_INIT_128 for example) to ease the transition.

@jhedberg
Copy link
Member

jhedberg commented Sep 2, 2024

There are lot of uuid handling code in Bluetooth, should BT use this library?

I'm unfamiliar regarding the Bluetooth use of the UUID. I see it uses 16 and 32 bits partial UUIDs, which are not defined in RFC 9562. We could add a couple of defines similar to the ones in the bluetooth driver (BT_UUID_INIT_128 for example) to ease the transition.

AFAIK at least the 16- and 32-bit Bluetooth UUIDs don't follow any non-Bluetooth standard. The way the Bluetooth core specification deals with these, is that in practice everything is a 128-bit UUID, however UUIDs which are derived from the so-called Bluetooth Base UUID can be compressed into 16- and 32-bit formats (and these again expanded back into 128 bit format). I have a feeling we'll need to keep the Bluetooth UUID API separate, but some proper research into their compatibility with these RFC specs needs to be done. We might at most e.g. provide conversion functions to convert back and forth between bt_uuid_t and uuid_t.

@jukkar
Copy link
Member

jukkar commented Sep 3, 2024

@sorru94 could you check if there would be any non-BT code that could be changed to use this library?
At least in dhcpv6

struct dhcpv6_duid_uuid {
there is uuid handling.

include/zephyr/sys/uuid.h Outdated Show resolved Hide resolved
include/zephyr/sys/uuid.h Outdated Show resolved Hide resolved
include/zephyr/sys/uuid.h Outdated Show resolved Hide resolved
include/zephyr/sys/uuid.h Outdated Show resolved Hide resolved
include/zephyr/sys/uuid.h Outdated Show resolved Hide resolved
lib/uuid/uuid.c Outdated Show resolved Hide resolved
tests/lib/uuid/src/main.c Outdated Show resolved Hide resolved
tests/lib/uuid/src/main.c Outdated Show resolved Hide resolved
tests/lib/uuid/src/main.c Outdated Show resolved Hide resolved
tests/lib/uuid/src/main.c Outdated Show resolved Hide resolved
Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there's an endianness collision between the existing code and the new library? UUIDs basically made a huge mess of this with that text format, and almost nothing does it "correctly" (if there even is a "correct"). But 100% for sure we should be doing it consistently, and I don't think we are?

include/zephyr/sys/uuid.h Outdated Show resolved Hide resolved
uuid_t third_uuid_v4;
uuid_t first_uuid_v5;

uint8_t expected_first_uuid_v4_byte_array[16u] = {0x44, 0xb3, 0x5f, 0x73, 0xcf, 0xbd,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More drive-by: this is clearly byte-packing a big endian UUID. The first element is a 32 bit dword and should be stored "73, 5f, b3, 44" on little endian systems.

@sorru94
Copy link
Contributor Author

sorru94 commented Sep 6, 2024

Seems like there's an endianness collision between the existing code and the new library? UUIDs basically made a huge mess of this with that text format, and almost nothing does it "correctly" (if there even is a "correct"). But 100% for sure we should be doing it consistently, and I don't think we are?

I'm unsure about the endianness for the UUID used/expected by the BT module (or other modules).
However, I tried to follow as much as possible RFC 9562 for this library where they state:

In the absence of explicit application or presentation protocol specification to the contrary, each field is encoded with the most significant byte first (known as "network byte order").

And:

Saving UUIDs to binary format is done by sequencing all fields in big-endian format. However, there is a known caveat that Microsoft's Component Object Model (COM) GUIDs leverage little-endian when saving GUIDs. The discussion of this (see [MS_COM_GUID]) is outside the scope of this specification.

I'm guessing the UUID in the Bluetooth module uses the COM GUIDs format where the first 4 bytes are represented using little endian while the last 6 are represented using big-endian.

I don't know if including a non-standard representation of UUIDs would be a good choice for this library. We could always introduce helper functions for conversion to the "COM GUIDs" format. Here or directly in the BT module.

What do you think? @andyross

include/zephyr/sys/uuid.h Outdated Show resolved Hide resolved
include/zephyr/sys/uuid.h Show resolved Hide resolved
include/zephyr/sys/uuid.h Outdated Show resolved Hide resolved
lib/uuid/Kconfig Show resolved Hide resolved
lib/uuid/uuid.c Outdated Show resolved Hide resolved
lib/uuid/uuid.c Outdated Show resolved Hide resolved
lib/uuid/uuid.c Outdated Show resolved Hide resolved
lib/uuid/uuid.c Outdated Show resolved Hide resolved
lib/uuid/uuid.c Outdated Show resolved Hide resolved
lib/uuid/uuid.c Outdated Show resolved Hide resolved
@andyross
Copy link
Contributor

andyross commented Sep 6, 2024

However, I tried to follow as much as possible RFC 9562 for this library

Yeah... the problem is that "UUIDs" in practical code don't really and haven't ever followed that RFC, which is a post-hoc kind of cleanup thing. Windows GUIDs are the obvious example of something that is always converted LE, and the code @Thalley posted above certainly implies the BT inherited the convention.

I'm don't have enough of a stake here to demand one or the other, I'm just saying (1) whatever code we commit needs to be extremely clear about the endianness convention and (2) it should match the conventions used by any existing in-tree users (or else support both, I guess, but... yuck)

@sorru94 sorru94 requested a review from Thalley September 9, 2024 12:35
@sorru94 sorru94 force-pushed the add-uuid-utilities branch 3 times, most recently from 64065cc to 0ab6537 Compare September 9, 2024 12:54
Copy link
Collaborator

@Thalley Thalley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks OK to me. Unsure how and if we can use this in BT due to endianess, so I suggest to make it experimental so that we can modify things until we figure that part out.

include/zephyr/sys/uuid.h Outdated Show resolved Hide resolved
Comment on lines 93 to 96
* An UUID in the standard big-endian RFC9562 representation `00112233-4455-6677-8899-AABBCCDDEEFF`
* has the equivalent little-endian COM GUID representation `33221100-5544-7766-8899-AABBCCDDEEFF`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? The last 2 values, 8899 and AABBCCDDEEFF, don't need to be swapped?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 8899 should be 9988 while the last one is correct. See the other related comment for more info.

What other related comment? Can you share the link?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we follow the COM GUID representation the last two should not be swapped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I deleted the comment as It was my mistake. I replied on the other comment and will push the changes shortly.

lib/uuid/uuid.c Outdated Show resolved Hide resolved
lib/uuid/uuid.c Outdated
Comment on lines 177 to 171
memcpy_swap(&out[first_part_start], &data[first_part_start], first_part_size);
memcpy_swap(&out[second_part_start], &data[second_part_start], second_part_size);
memcpy_swap(&out[third_part_start], &data[third_part_start], third_part_size);
memcpy_swap(&out[fourth_part_start], &data[fourth_part_start], fourth_part_size);
memcpy(&out[fifth_part_start], &data[fifth_part_start], fifth_part_size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the documentation for this function, only the first 3 parts are swapped, but this swaps the first 4. Why?

And why isn't the 5th swapped? (not familiar with the UUID spec)

Copy link
Contributor Author

@sorru94 sorru94 Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with the endianness is that the UUID spec defines UUID only as big-endian.
The interpretation of little-endian UUID is not universal.
From my understanding here are our options for the UUID:

  • Standard big-endian representation: 00112233-4455-6677-8899-AABBCCDDEEFF
  • little-endian representation inverting each block: 33221100-5544-7766-9988-FFEEDDCCBBAA
  • little-endian representation inverting the full UUID: FFEEDDCC-BBAA-9988-7766-554433221100
  • little-endian representation for Microsoft COM GUIs representation: 33221100-5544-7766-8899-AABBCCDDEEFF

I'm not sure which of those little-endian notations is used in the Bluetooth module.
However, from my very superficial research, I think the last one is the more widely adopted. It's also the only one referenced in the standard.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to read up on the UUID spec. Endianess is always a bit tricky, but it usually comes down the the language and terminology used in the specs and whether the blocks are values or arrays of values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blocks change based on the UUID version. For example, v5 is defined as follows:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           sha1_high                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         sha1_high             |  ver  |      sha1_mid         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|                       sha1_low                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           sha1_low                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Where sha1_* are all components of the same SHA-1 hash.
While UUID v1 is defined as follows:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           time_low                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           time_mid            |  ver  |       time_high       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|         clock_seq         |             node              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                              node                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

As a consequence, it might be difficult to find a generic conversion method.

lib/uuid/Kconfig Outdated Show resolved Hide resolved
@Thalley
Copy link
Collaborator

Thalley commented Sep 12, 2024

So after the RFC and looking more into this, I've noticed the following:

  1. 4122, which is obsoleted by 9562, does seemingly not define any specific endianess
  2. 9562 defines UUIDs to be stored as big-endian
  3. MS documentation seemingly only refers to 4122
  4. Haven't found any references to 4122 or 9562 from the BT specs
  5. If we want to be 9562 compliant and treat all UUIDs a big-endian, then it's pretty each to support
  6. If we want to support converting between little and big endian, then it's more tricky. Each version of UUID (version 1 to 8) have slightly different structures, and ideally each value in each version should be converted. As far as I can tell 9562 does not define how to represent those as strings, because it is big-endian.
  7. MS converts the first 3 values to little endian (https://devblogs.microsoft.com/oldnewthing/20220928-00/?p=107221) and then treats the 6-byte MAC address as a 6-octet array instead of a 6-octet value. MS seemingly also treats time_high and ver as a single value, instead of 12 and 4 bit values respectively. MS seemingly uses UUID version 1 (https://www.rfc-editor.org/rfc/rfc9562.html#name-uuid-version-1) for their GUIDs (UUIDs). The way that MS treats UUIDs is a complete mess 🙃

To continue with this PR, I think we need to ask what it is we want to achieve. Do we want to library that implements 9562? Do we want a library that is compatible with MS COM? Do we want a library that is compatible with Bluetooth?

The 9562 RFC has this nice little paragraph in the beginning that pretty much sums up the mess:

The use of UUIDs is extremely pervasive in computing. They comprise the core identifier infrastructure for many operating systems such as Microsoft Windows and applications such as the Mozilla Web browser; in many cases, they can become exposed in many non-standard ways.

The purpose of 9562 is to take RFC 4122 and properly standardize it.

@sorru94
Copy link
Contributor Author

sorru94 commented Sep 20, 2024

No additional comments from me.

It would have been nice to have a defined use case for this implementation (e.g. a sample or something), but not sure it's necessary. It's somewhat odd to add support for something without any uses, especially since this cannot be used with e.g. Bluetooth, but maybe the SOF module or something like that could use this in the future.

I agree that it's odd. However, the UUID is already in use in many places within the project. I hope this library will entice people to standardize its use.

@Thalley
Copy link
Collaborator

Thalley commented Sep 20, 2024

I agree that it's odd. However, the UUID is already in use in many places within the project. I hope this library will entice people to standardize its use.

Which project are you referring to here? The SOF?

@sorru94
Copy link
Contributor Author

sorru94 commented Sep 23, 2024

I agree that it's odd. However, the UUID is already in use in many places within the project. I hope this library will entice people to standardize its use.

Which project are you referring to here? The SOF?

I was referring to the Zephyr project.
As @jukkar mentioned above dhcpv6 uses some form of UUIDs. Furthermore, I could find some UUID in fs/ext2.h, usb/bos.h, and drivers/tee.h.
Hopefully, once this driver is available those modules could transition to using this one instead of ad hoc implementations.

@jhedberg
Copy link
Member

The one thing that concerns me a bit with the way you've defined the uuid_t type, is that it hides away the distinction of passing by value vs passing by reference. Maybe that's a good thing, but it's often nice to be able to see this very clearly in API usage by having pointer types vs value types, and using & to pass by reference, etc. Since the type that uuid_t is encapsulating is an array you end up in practice always passing by reference, which could come as a surprise, e.g. if a function expects to be able to modify an input parameter without it having any effects on the caller. I don't have a strong opinion either way, and there does seem to be benefits of the current solution, but I'm wondering if you did some careful comparison & analysis of defining the type as:

typedef struct {
	uint8_t uuid[UUID_SIZE]
} uuid_t;

vs the current:

typedef uint8_t uuid_t[UUID_SIZE];

@Thalley
Copy link
Collaborator

Thalley commented Sep 23, 2024

The one thing that concerns me a bit with the way you've defined the uuid_t type, is that it hides away the distinction of passing by value vs passing by reference. Maybe that's a good thing, but it's often nice to be able to see this very clearly in API usage by having pointer types vs value types, and using & to pass by reference, etc. Since the type that uuid_t is encapsulating is an array you end up in practice always passing by reference, which could come as a surprise, e.g. if a function expects to be able to modify an input parameter without it having any effects on the caller. I don't have a strong opinion either way, and there does seem to be benefits of the current solution, but I'm wondering if you did some careful comparison & analysis of defining the type as:

typedef struct {
	uint8_t uuid[UUID_SIZE]
} uuid_t;

vs the current:

typedef uint8_t uuid_t[UUID_SIZE];

Doesn't our coding guidelines (or the Linux kernel guidelines) suggest not to typedef structs?
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#typedefs

Another alternative would be to do

struct uuid {
    uint8_t uuid[UUID_SIZE]
};

and then pass around struct uuids which can be passed by value, by mutable reference or const reference (which I believe is the point of your suggestion here as well)

@jhedberg
Copy link
Member

@Thalley my impression is that Zephyr is much more relaxed to the usage of typedefs, but there should always be a some good justification for their usage.

@jukkar
Copy link
Member

jukkar commented Sep 24, 2024

I was referring to the Zephyr project.
As @jukkar mentioned above dhcpv6 uses some form of UUIDs. Furthermore, I could find some UUID in fs/ext2.h, usb/bos.h, and drivers/tee.h.
Hopefully, once this driver is available those modules could transition to using this one instead of ad hoc implementations.

I would suggest that at least one user of this api could be found and converted to use it in this pr. It is a bit weird to have an api in upstream that nobody uses.

Add UUID generation and parsing utilities compliant
with RFC9562.

Signed-off-by: Simone Orru <simone.orru@secomind.com>
Add some tests for the UUID utilities.

Signed-off-by: Simone Orru <simone.orru@secomind.com>
Include the UUID utilities in the Miscellaneous
documentation page.
The UUID is being placed in a new Identifier
APIs section.

Signed-off-by: Simone Orru <simone.orru@secomind.com>
@sorru94
Copy link
Contributor Author

sorru94 commented Oct 3, 2024

@Thalley and @jhedberg. As for your suggestion, I replaced the type definition with a struct. Let me know if it was what you intended.
@jukkar I tried to use this library in the ext2 library, just to have one that uses it.

@Thalley
Copy link
Collaborator

Thalley commented Oct 3, 2024

@sorru94 Looks like CI tests are failing, please have a look

@sorru94 sorru94 force-pushed the add-uuid-utilities branch 2 times, most recently from 2ccc2cd to 6e5b0fe Compare October 4, 2024 09:27
Replace the typedefined array uuid_t with a
struct named uuid.

Signed-off-by: Simone Orru <simone.orru@secomind.com>
@@ -27,7 +29,7 @@ struct ext2_cfg {
uint32_t block_size;
uint32_t fs_size; /* Number of blocks that we want to take. */
uint32_t bytes_per_inode;
uint8_t uuid[16];
struct uuid uuid;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an API change, isn't it?

If it is a stable API, then it needs to follow https://docs.zephyrproject.org/latest/develop/api/api_lifecycle.html#introducing-breaking-api-changes

An alternative which is a bit dirty would be to do

Suggested change
struct uuid uuid;
union {
uint8_t uuid[16];
struct uuid uuid;
};

So that it's not breaking. I'll leave this up to the maintainer(s) of the fs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true, I'm not sure how much the uuid field is used in the current version of the ext2 module. It would be nice if the maintainers of the fs could help.
Another option for this change to be non-breaking would be to keep the typedef uint8_t uuid_t[UUID_SIZE]; in uuid.h instead of the struct version.

* @retval 0 The UUID has been correctly copied in @p dst
* @retval -EINVAL @p dst is not acceptable
*/
int uuid_copy(struct uuid *dst, struct uuid src);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I've noticed now is that you more often than not pass by value (e.g. struct uuid src) rather than by reference (struct uuid *dst).

Is that something you have considered? Since the struct is 16 octets, that's quite a lot of data being passed around compared to a pointer (typically 4 octets).

Should the API primarily use pass by reference instead of value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually pass by reference only when changes to the parameter are needed, i.e. when it's used as an output.
Do you know if copying 16 Bytes on each function call is an issue? Are there guidelines for this in the Zephyr project?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if copying 16 Bytes on each function call is an issue? Are there guidelines for this in the Zephyr project?

I'm not aware of any guidelines, but pass by value would increase the stack usage, right?

Personally I (almost) always pass by reference for structs (unless they are smaller than the size of a pointer).

Copy link
Member

@jhedberg jhedberg Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sorru94 to make the distinction of "input" vs "output" parameter clear, a good practice is to use const struct uuid *uuid for input, which makes it clear that the function will not modify it.

include/zephyr/sys/uuid.h Outdated Show resolved Hide resolved
subsys/fs/ext2/ext2_format.c Outdated Show resolved Hide resolved
Use the sys/uuid.h library to generate and hold the
ext2 UUID.
This commit also includes some formatting for the
ext2 library.

Signed-off-by: Simone Orru <simone.orru@secomind.com>
Uniform the API parameter names.
Using `data` for input data and `out` for the output structures.

Signed-off-by: Simone Orru <simone.orru@secomind.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Base OS Base OS Library (lib/os) area: File System
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Addition of Universally Unique Identifier (UUID) utilities
7 participants