-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Integrated Boot Configuration System (build, provisioning, runtime) #76902
Comments
Just a comment in relation to "Detailed RFC" point 5. |
W.r.t. Boot Configuration, there is currently a PR that adds a multiboot / efi "kernel command line" (CMDLINE in Linux). I had some concerns about it being limited to just a couple of architectures. In that ballpark though, it would be helpful to specify non-security-sensitive parameters that can vary at build, provision, and init-time (aka runtime) - e.g. logging uart(s), framebuffer details, runtime-queryable software parameters, and so on. It's not ideal to have to parse a string, but could be encoded a number of different ways that would make it parseable (and maybe even directly addressable). I think, at least at one point, the Linux kernel command line had been added to Devicetree. Some Zephyr community members might prefer more structured formats as well (e.g. protobuf, thrift) |
Absolutely. Such an input at runtime could be considered just another source that is layered on top of the others with high priority. That's why it is so much more important to agree on a common data model rather than discussing serializations. |
Sad to see this closed. I really think @fgrandel has some good points here about what Zephyr should be and actually is (application development platform vs pure RTOS) and how the use cases which derive from this design decision can be solved best. I am also convinced that the strict separation between hardware and software configuration is not beneficiary, as the software is written for this specific hardware, although vendor agnostic, and will therefore reference it and use it. Even more if we consider the reoccurring discussion topic if something is software or hardware, which shows that this separation is not as clear as one would think of in the first place. |
@benediktibk Thanks for your support. Maybe s/o else can continue the work on this PR then? I seem to have started it off on the wrong foot, so I'm somewhat burnt wrt this topic: the discussion of this RFC has been too much hassle for me with too little result. I'm sure this proposal is far from optimal but the feedback given so far has not been constructive either. Anyone who likes to take over, feel free to assign yourself. WRT CT (DT syntax only) as a first serialization format: Please note that I chose it for purely pragmatic reasons to save the community the hassle of having to re-invent the wheel. It's a stepping stone on the migration path, otherwise mostly irrelevant to the proposed architecture. I never understood how discussions of syntax could be so much more relevant to the arch WG than semantics. This honestly does not fit my own understanding of what matters in architecture. But this seems to be a minority opinion in the arch WG, so I won't insist. |
Wasn't this discussed in the arch WG this week? why is there not a summary post? @carlescufi I am missing context for why the issue is closed now, I missed the meeting |
I am curious, do you really think there is not a technical benefit at least from a maintenance aspect for both users and tree developers, to keep the DT focused only on hardware? I also am skeptical of the "philosophical" statements about it which always seem dogmatic... but it still seems like maybe it would be good to have a more stable devicetree definition for hardware that doesn't change as easily as the whims of software changes could. Maybe I'm misunderstanding what you said here though, and you were only saying the inter-OS compatibility doesn't really have a technical reason? In which case, maybe I agree about that, it seems like a highly suspect aspiration that is also thrown around a lot as a hand wavy justification for the zephyr DT "philosophy". (it seems like the whole point of this RFC is to say, to separate hardware and software configuration, only this sentence confused me about what you see as the benefits of having DT) |
Your questions have been addressed in the RFC. I'll cite it for your convenience where appropriate. As several others in the Zephyr community, you seem to take it for granted that the distinction between hardware and software properties is precise and objective. However, our own experience and decades of research have consistently shown that ontological definitions are subjective and based on "group think" (ie. self-stabilizing systemic discourse) and therefore truly "hand wavy". Unfortunately humans tend to ignore the fact, see e.g. Kahnemann for a number of nobel price awarded self-tests that everyone can reproduce at home. From this RFC:
This means that we probably have as many hw/sw distinctions in practice as we have maintainers. ;-) Due to confirmation bias we tend to describe distinctions made by others as "errors" or even "deviations from the pristine beauty of DT". The only axiomatic definition I heard so far is "every property 1-to-1 (i.e. functionally bijective) to a driver instance is DT". But then why do we fight over network configuration as it is almost exclusively 1-to-1 to driver instances with mathematical precision? And what about all those existing DT properties that are not 1-to-1 to driver instances (eg cpus, ram, flash, interrupts, etc.)? (Note: I don't back this approach as it breaks encapsulation, deviates from established Linux practice and doesn't scale.) The RFC calls this out:
You and I agree that maintainability means that files that change at different frequencies should be kept apart. Our current dogmatic use of DT breaks this rule due to coupling of almost constant (and mostly Linux-compatible) board/soc related properties with driver- and subsystem-related properties. Further down in my RFC I'm therefore introducing well-established axiomatic concepts and practices that are all but "hand wavy" to replace the software/hardware distinction and reduce DT to the de facto objectively Linux-compatible part, all the rest is per definition abstractly modeled CT (which can be expressed in any syntax including - but not limited to - YAML and DT syntax), even this basic fact has been ignored by almost all commenters:
Note: Just do a search in our code base and that of Linux to find the "least visible" file system folder precisely.
"Conventional wisdom" (as I call the kind of biased assumptions that we all unavoidably share as human beings) can be overcome by argument but this is hard especially if held by trusted entities like the Linux community, the DTSpec, TSC members or an initial majority of the arch WG that is unaware of the arguments because they have not read the RFC but unfortunately still opinionate - and do so highly emotionally as we've seen. @nashif, @carlescufi As you seem to be back from holiday: As others have pointed out: Not rational arguments have so far been important in discussing this RFC but what the majority of devs subjectively believes to be true (even consciously and proudly so without giving "rational" arguments). My RFC goes beyond what can be digested in five minutes by arch WG members. I already can see eyes rolling because I'm writing another long argument that cannot be subsumed in 140 characters. But the many misunderstandings show that this unfortunately seems to be required. I hope we all still read books, datasheets and specs for a reason. Not so RFCs and comments it seems? I believe that we need a different architecture WG decision procedure and debating culture if we want to address problems like this systematically in a more productive way:
Such behaviors should be actively encouraged by whoever currently chairs the discussion or stumbles upon them in RFCs tagged with "Architecture Review" and contributions no respecting these basic rules disregarded in decisions. This is what our CoC requests anyway and is required to level the playing field for the OP. IMO it is not the task of the OP alone to defend herself against non-productive behavior. The burden should be shared by all community members, especially maintainers and of course TSC members. RFCs and PRs systematically land in arch review because they have caused conflict that needs to be actively moderated. IMO a structured architecture decision process also requires establishing a shared list of requirements (the problem space) before accepting statements about implementation details (the solution space). Except for a few examples this RFC has been exclusively discussed in the solution space right away w/o giving me the time and the benefit of the doubt to calmly lay out my findings. Worse: Even after the problem space was laid out in written form, it was largely ignored and it seemed to be considered the OPs task to deal with that. Carles, I did recognize your attempts to ensure that we define requirements first and I really appreciated this. Thinking and reading before commenting takes time but it spares people like me a lot of trouble as our only way to consent is argument and debate plus it spares the community having to work around lack of decision or deadlock for years as we obviously do in the config area where "irrational" opinions seem to be allowed to block "rational" arguments. If casual arch WG participants do not have time to consider complex arguments routinely then we need a dedicated arch WG team to support OPs that contributes bandwidth to prepare discussions, establish internal consensus about architecture and design principles and enforces them when challenged by participants unaware of the problem space. Should I open an RFC so we discuss this in more detail in the next arch WG? ;-) I also request that we do not lightly accuse someone of breaching the CoC as has happened to me in this context. It should be acknowledged that to me as a non-native speaker it is hard to distinguish between an argument that is "nil" and an argument that is "simply not true" (your own words). I'd say both dismiss a previously given argument equally which IMO is an integral part of ongoing debate and should be acceptable. I also request that TSC members calmly refer to the CoC in private to give people unaware of the emotions they caused a chance to apologize or rectify misunderstandings. I also believe that the CoC was nowhere breached, it is especially hard to accept that I was accused of bringing in religion to the discussion after citing a well-known metaphor by R. Feynman (which I even immediately replaced with s/th less easy to misinterpret once pointed to it). I firmly believe that this could even be considered abuse of the CoC as clearly religion is meant differently in that context. Such moderated behavior is of course desirable by all community members including myself but I find it much more forgivable if those who have less visibility and responsibility in the community now and then become emotional, this is so human. The only thing I'd have expected was a prompt clarification and correction once the facts were on the table. Unfortunately this never happened and IMO largely contributed to damaging the discussion and leading it in the wrong direction. Additionally statements of the kind I described as non-productive above were repeatedly endorsed after I had given criteria for what IMO is broadly accepted as "good debating culture" to re-focus the discussion on argument and community requirement rather than personal taste. I consider this an example of really outstandingly bad behavior by a trusted community leader. I did mention so in private before calling it out here in public but unfortunately to no avail - much to the contrary. Plus I was encouraged to place my critique in public which I hereby do. I hope this sufficiently explains why under such circumstances I considered it a waste of time to continue supporting this RFC w/o leading the meta-discussion first. |
Actually, I would say as the only active person on the DT collaborator list for the last couple months, it's probably more apparent to me than most people that the gray area is an issue. From being on the (recently removed, by me) DT binding maintainer area, I saw many things being added to DT that aroused this type of argument over HW/SW distinction. I am still fairly green to the area, but what I have been trying to do by asking around to some active community members is to try to characterize the community widsom (maybe what you called "group think") about what is HW vs SW as well as what benefit are we deriving from the automatically accepted belief that we need to separate configuration of them into separate schemas and languages. Because I am loathe to disregard the well established conventions but also skeptical about what I keep hearing repeated all the time without much consideration, especially given the frequency of disagreements as you have pointed out. This is why I was trying to clarify your opinion about it in my question above, since you have clearly put so much systematic thought into this, I thought it would be valuable to understand your perspective. So far, the most convincing thing I have heard in the past is what I mentioned, about frequency and reason for changes to the configurations. I don't know if you would call it axiomatic, but it seems like to me that a good rule of thumb might be that you should not have to change DT unless your physical hardware configuration has changed. Of course this is not how zephyr DT works right now.
I am not sure if I agree or not about what you're describing as the original purpose of Kconfig. Feature selection is a big part yes, but there is clearly built into the language support for configs that are strings, ints, hex, etc, do you really think these types are semantically meant just for feature selection? |
I didn't know this, I misunderstood you there. Sorry for pointing out what you already knew. It's reassuring, though, that we made the same observation.
Same on my side. I'm very much interested in yours! I can see that you bring in constructive argument.
Yes, I agree. The argument is good, but we don't really follow it because we mostly don't add new properties because the hardware changed. Most of our new properties are added at the same rate as new driver features. So they are actually driver properties from an "axiomatic" rate-of-change perspective although they of course describe some kind of hardware/driver interface. But this doesn't matter for encapsulation purposes as these properties are nowhere used than in the drivers. As soon as they actually are used elsewhere we need to move them of course, but only then.
Yes, we again agree. See above. To make it clearer what I mean by "axiomatic" as opposed to "ontological":
You're right, I formulated this too sloppily. As Linux doesn't have another build-time kernel configuration mechanism they partly use it for configuration as we do. The separation between feature selection and software instance configuration is my own "invention" and would be specific to Zephyr to overcome the artificial distinction between "single instance" and "multi instance" configuration. Thanks for pointing that out. |
I am a bit out of context regarding the reactions to this RFC: did all this happen during arch WG meeting(s)? Is the recording and/or transcript available somewhere? On the actual RFC I do agree with it. Unfortunately I am not skilled enough to implement it myself nor to even imagine what the implementation would look like. I do know that there are situations where DT is the current way-to-go because it provides the necessary technical implementation (UAC2 with its extreme macrobatics is actually about translating "user-readable syntax describing UAC2 device" to "a bunch of blobs for external host use and a bunch of easy-to-use lookup arrays for class implementation") . The alternative would be coming up with some a subsystem (or even worse - a problem) specific custom language with its own tooling. |
Thanks for coming back to this RFC, the discussion continued here: #77638 |
TLDR; Please see #76903 which exemplifies most of the concepts laid out in this RFC in a PR that is much easier to review than this detailed RFC. This continues the discussion started in #68127 in more detail.
Introduction
Currently, we have no satisfactorily integrated solution to configure in-memory software component instances (as opposed to software features) at build and provisioning time.
Existing approaches like Zephyr's current flavor of Devicetree (DT), Kconfig or the settings subsystem provide partial solutions but, as generic software component instance configuration systems, they lack in overall structure, flexibility, scope or scalability.
Examples:
Apart from these pending functional requirements, we have created a rather artificial and complicated (from a user perspective) distinction of the relative domains of applicability of Kconfig and DT due to an insufficiently precise ontological hardware/software divide and structural deficiencies of Kconfig.
The Kconfig/DT distinction SHALL be made more precise, practically useful and enforceable and the resulting software component instance configuration SHALL be represented in a more maintainable and scalable unified format with currently required Zephyr-specific Kconfig/DT "quirks" removed.
Problem description
Note: See motivating use cases for the following requirements in "Exemplary Use Cases" below.
A. This RFC addresses the following specific problems:
Of course such a transitions requires time and effort. Therefore this RFC proposes a gradual migration path from the current state to the target state maintaining long-term backwards compatibility at every step forward w/o introducing further inconsistencies. No one SHALL be distracted from their actual goals or invest extra effort in the migration of configuration if not on their own demand to satisfy their own needs and requirements.
Proposed Change
This RFC proposes an abstract conceptual data model, serialized - in a first step - to a backwards compatible, semantic extension of the DT format. This DT superset is called "configtree" (CT) in the following.
The solution will be exemplified for network interface settings but will be extensible to all subsystems and applications as laid out in the problem description.
The proposed architecture allows for later addition of alternative source or target serializations, e.g. settings subsystem key/value pairs, property, protobuf IDL or Thrift files, integration with externally managed databases or secure key stores, JSON or YAML files provided locally or retrieved from a network location.
Note: See System Device Tree's simplified YAML serialization of DT (and CT) as one option to represent CT as YAML.
CT is proposed as a first serialization format for pragmatic reasons of usability, simplicity, initial effort and long-term maintainability. It will be shown, that it is entirely capable to represent the proposed abstract data model in an - as we find - rather intuitive way. It satisfies all technical, logical and business requirements of a serialization source and intermediate unified format within the proposed overall configuration approach.
The proposed migration path consists of the following steps (not necessarily in this order):
Note: Splitting and merging CT could be achieved with the Lopper tool from the System Device Tree project. It allows to manipulate DT (and CT) files based on a syntax similar to XPath.
Detailed RFC
This RFC specifies an improved overall hardware, software feature and software component configuration for Zephyr as existing configuration approaches are lacking:
zephyr,...
and<vendor>,...
extensions. It was designed to represent hardware independently of any specific operating system. Its current tree structure and usage rules in Zephyr do not represent a normalized graph of distributed configuration object instances and breaks encapsulation rules.A few solutions for specific application/subsystem configuration problems exist
/chosen
node (DTSpec v0.4, section 3.6) allows to refer to other DT nodes to configure global switches related to/referring to hardware/driver configuration. In Zephyr these are mostly used to configure samples, basic OS features or choose hardware for specific use cases (e.g. the console target or the settings partition). This approach only allows to set<phandle>
s or aliases and does therefore not scale./zephyr,user
node allows application developers to define simple key/value pairs. It is conceived as an ad-hoc configuration mechanism, though, that does not scale to the required structures.CONFIG_SOMETHING_0/1/2/...
) "hacks" that work around Kconfig's lack of object instance support. This approach does not scale and it can only be applied to fixed multiplicities.None of the existing approaches scales to the levels required in Zephyr today. In the absence of a proper configuration system they tend to be (ab)used for properties that should better be represented in a well-defined application/subsystem configuration framework. This RFC tries to lay out the requirements of such a system as well as proposes a specific implementation and migration approach.
Exemplary Use Cases
The following use cases illustrate and motivate detailed requirements.
Note: These use cases don't necessarily cover all features of the proposed configuration approach. If some requirement is neither self-evident nor covered by a corresponding use case, please comment and let me know.
Scalable, Resource-Optimized Build and Provisioning Time Boot Configuration
As an embedded application developer I want to configure immutable boot defaults across all enabled subsystems consistently at build-time w/o incurring avoidable resource usage (e.g. CPU cycles, RAM or ROM). I want only such boot configuration to consume non-volatile memory that needs to be injected at provisioning time and/or changed at runtime. I also want to scale effortlessly from a single instance to a multi instance software component configuration or promote build-time to a provisioning-time configuration or vice-versa w/o having to migrate properties between independent configuration approaches (e.g. from Kconfig to DT to the Settings Subsystem and back).
Extensible and Re-Usable Configuration of Samples
As a maintainer or contributor I want to create driver- or subsystem-specific samples that can as effortlessly as possible be combined and extended by embedded application developers into fully-functional customized solutions. The sample build boot configuration should therefore use the same format and tools required for single instance and multi-instance build-time or provisioning-time software component boot configuration as a scaled custom application.
Build Time Injection of Boot Configuration
As a large-scale application developer I want to be able to define large amounts of build time configuration variants externally to Zephyr. I want to use my own custom configuration format (e.g. Thrift or protobuf), possibly editable and sourced dynamically from a database or network location independently from Zephyr and application code repositories.
Provisioning Time (e.g. End-of-Line) Boot Configuration
As a production engineer I want to be able to provision device specific settings as fast as possible to target devices w/o having to re-compile the device's firmware, e.g. as a separate settings image via JTAG or a SPI flash tool to an EEPROM, flash partition or dedicated flash storage. To develop or debug end-of-line configuration, as a firmware application developer, I want to be able to simulate end-of-line configurations at build time w/o having to use complex production-specific tooling or migrate configuration properties between separate configuration approaches.
Runtime Boot Configuration
As an end user of a device, I want to be able to change provisioned boot-time defaults of my device persistently at runtime (e.g. to configure custom network details if Zephyr is powering a typical home router device). As an application firmware developer, I do not want to incur extra effort to provide provisioning and runtime configuration through separate configuration approaches.
Declare initialization and reverse dependencies between software component instances
As a maintainer or contributor, I want to declare default initialization dependencies and sequences of related software component instances ("services"). As an firmware application developer, I want to be able to override default initialization dependencies and sequences. As a maintainer, contributor or firmware application developer, I want to specify and configure arbitrary lifetime hooks in addition to the default initialization callback that should be respecting the inversion-of-control principle w/o the software component instance having to "know" (ie. depend on) the caller.
Supply security material from secure sources to secure targets
As a production engineer I want to be able to inject confidential security material directly from a secure key vault to a secure embedded storage at the end of my production line.
Detailed Requirements
This section describes detailed requirements in addition to the main functional requirements A.1 through 6.
B. Scope:
C. Source and Target Serializations:
<stdint.h>
, structs and pointers. Additional target type systems as used in Rust or C++ SHOULD additionally be supported, at least in principle.D. Maintainability:
E. Documentation:
F. Machine-readable metadata describing configuration data and schemas:
G. CT-specific requirements:
Note: CT as default source and intermediate serialization format together with existing DT macros, tooling and corresponding documentation satisfy almost all of these requirements out-of-the-box with minimal initial implementation effort.
Note: Initialization dependency properties MAY be modeled as just another kind of composable binding schema that MAY be applied to certain CT nodes according to CT normalization rules. Nodes representing initializable software component instances declare initialization dependencies to other initializable software component instances via hierarchy or
<phandle>
. All default initializations may be accumulated in a single file or distributed over subsystems according to CT encapsulation rules.Proposed change (Detailed)
Configtree (CT) Specification
CT is a natural semantic superset of DTSpec (and the upcoming System DT). CT SHALL use the same syntax as DTSpec without hardware specific properties in non-device/non-hardware nodes. Allowed standard properties in non-device/non-hardware nodes are "status" and "compatible". CT MAY introduce additional
<prop-encoded-array>
if required. Currently no such requirement is known, though.CT SHALL be backed by a well-defined Zephyr-specific abstract conceptual configuration data model (the "Zephyr configuration space") that includes existing DT entities and attributes as well as CT-specific extensions. The abstract data model SHOULD be documented in the Zephyr user documentation using adequate textual graphing techniques (e.g. based on mermaid) for easy review. Alternatively the model MAY be generated automatically from improved binding sources that not only specify properties but also relations.
CT introduces additional nodes and properties into the device tree (called "configtree" for CT) that structurally relate 1-to-n or n-to-m to existing driver or hardware nodes. Software component instance related configuration properties SHALL be introduced into existing DT nodes if they structurally relate 1-to-1 (bijectively) to existing nodes.
Nodes that structurally relate 1-to-n to existing nodes SHALL be n-side subnodes of the 1-side node. A collection of 1-to-1 related properties inside the same node MAY be grouped in their own subnode for improved encapsulation (e.g. for separate subsystems or larger semantically related properties), similarly to the structures currently generated in Kconfig. It SHALL at all times be clearly specified, though, how nodes map to the abstract unified configuration space in order to prove normalization of the CT model representation.
Nodes that structurally relate n-to-m to device/peripheral-related nodes require an additional top-level sub-space to be introduced. The structure of CT-specific top-level subspaces SHALL follow the file structure of the drivers or subsystems that require the additional node. DT or CT SHALL NOT introduce additional top-level nodes based on other custom encapsulation criteria. Existing non-standard top-level nodes other than those explicitly defined in DT or CT SHALL be regarded as "modeling bugs" and corresponding issues SHALL be opened to document and fix them.
References between n-to-m related nodes and nodes inside DT or CT-specific DT extensions SHALL be made explicitly using a DTSpec
<phandle>
. References using alternative custom primary keys (e.g. driver or interface names) or logic ("the first matching interface") SHALL not be used. 1-to-1 references SHALL not be allowed as they obviously breach CT normalization requirements.CT SHALL use the same Zephyr-specific .yaml binding files and macro targets as DT. CT-specific macro targets MAY be added. They are prefixed with "CT_" unless they can also be applied to DT.
CT introduces strict encapsulation rules. Files representing CT (including DT) and corresponding binding files SHALL be modularized into files according to the following rules:
The Zephyr Configuration Space
The following diagram proposes an initial abstract conceptual data model of the Zephyr Configuration Space. See #76903 which demos the model in CT serialization.
Also see https://drive.google.com/file/d/1sQuen1Y0bAIS5PX_kKRmSTNA_g4gd-gT/view?usp=sharing (requires the Google Drive draw.io plugin) for a possibly updated version.
This model SHALL be updated based on rules of normalization whenever additional entities need to be added. Property documentation MAY be added to illustrate normalization. Any serialization SHALL be validated against this conceptual data model and SHALL be rejected if not matched. Binding files SHALL document the (collection of) entities to which they can be applied. These binding file restrictions SHALL be verified during build.
Additional/Improved Binding File Semantics
Composition over Inheritance
Currently binding files only allow for inheritance of types. Composition of types (mix-ins) cannot be defined. This makes it unnecessarily hard (and sometimes impossible) to properly design a well encapsulated design hierarchy.
The following example shows current binding file design practice in Zephyr:
Example:
This binds a specific device to a driver-specific software programming model in practice. We justified this in the past by asserting that "adc-controller-yaml" would be exclusively determined by an objectively correct hardware-only ontology of an abstract ADC hardware model sufficient to all imaginable driver implementations.
This promise was of course rarely kept in practice. Driver internals regularly leak into supposedly hardware only "base types" which breaks encapsulation. This is not surprising as DT properties are largely determined by their usage inside Zephyr, not by an independent commonly accepted shared industry standard outside Zephyr except for properties introduced by Linux for which it may be argued that they represent a de-facto standard.
In practice our inheritance tree forces a client programming model onto the peripheral which is what DT was originally conceived for but may not always be compatible with Zephyr's claim to be "vendor agnostic" and "customizable". OTOH Zephyr has no requirement to be OS agnostic. So adding Zephyr-specific additions to CT is not a problem as they can be easily ignored by custom or vendor-specific driver or subsystem implementations. But if doing so, they need to be composable as not to pollute the inheritance hierarchy and they need to be properly encapsulated.
From a data model perspective, drivers are related (=chosen) to a combined hardware instance + application key, i.e. the correct combined abstract normalized "key" to such a compositional configuration class would be the
(app-id, peripheral-id)
tuple. This means that the above example there should be somezephyr,adc-controller
compatible mixed into the peripheral's node as default driver programming model which could be overridden on application level. Thezephyr,adc-controller
compatible then matches azephyr,adc-controller.yaml
file which is placed near the corresponding adc.h or adc driver folder where all the drivers reside that follow its programming model. The application would provide a partial DTS fragment that overrides or extends the driver's node "compatible" with its own custom driver client programming model if required, possibly in a directory hierarchy that again closely couples with the driver hierarchy. Introducing a proper naming convention, such rules could be verified automatically during build with resolution based error messages.The composition of bindings for typing should not be confused with the actual driver selection at build time. Driver selection follows the logic described in DTSpec: From left to right, drivers matching one of the compatible strings are being located in the build (as configured by Kconfig and cmake). If none or more than one matches per peripheral, a warning or error message is generated and the build possibly stops. This means that the
app-id
part of the conceptual driver implementation key above will be provided by app-specific Kconfig feature-inclusion mechanisms while theperipheral-id
part will be specified by the first matching compatible of the corresponding CT node.This rationale results in the following additional requirements for the Zephyr binding system:
Nomenclature and Directory Structure
Currently we require binding files to reside in separate top-level directories. This places binding files far from corresponding default DT source files and from usage sites and thereby breaks the above formulated CT encapsulation requirements.
This RFC therefore proposes an alternative naming schema
<[vendor,]programming-model>.binding.ya[m]l
. Files following this nomenclature MAY be placed anywhere in Zephyr's directory tree. They SHALL be placed as closely to their usage sites as possible, see CT encapsulation naming rules. The vendor part is optional when it is clear from context. Namely inside the Zephyr source tree thezephyr,
prefix SHALL be left out, to ensure that files can be placed near to other similarly named files based on the programming model (i.e. API).Additional Restrictions placed on Tree Structures
Similarly to JSON Schema and YAML Schema, we SHOULD be able to not only validate properties based on compatible strings but to also restrict node names for certain bindings (e.g.
channel
in ADC oriface
,ipv6
, etc. for well-defined network configuration nodes).We SHOULD be able to restrict subnodes to certain parents, e.g.
iface
SHALL have to be explicitly whitelisted as allowable child node by network peripherals orchannel
as child node of ADC peripherals possibly including multiplicity in both cases. Therefore theiface
andchannel
bindings SHALL be marked as "whitelist-only" and corresponding network driver nodes will have to include them explicitly in their "subnode-whitelist".Similarly it SHOULD be possible to place restrictions on allowable parent nodes, e.g. to only let
ipv6
orieee802154
define that they only SHALL be subnodes ofiface
. This time encapsulation requirements are opposite, therefore it suffices to include a "parent-whitelist" property to such bindings possibly including allowed multiplicity ranges.Configtree vs. Devicetree vs. Kconfig
Single instance vs. multi instance software component configuration
All software component instance configuration properties SHALL be deprecated in Kconfig and migrated to CT under the above rule sets.
Kconfig SHALL be exclusively responsible to select features, while all software component instance configuration SHALL be reserved to CT (including DT).
To make this more precise, the following rules SHALL apply:
Kconfig SHOULD thereby be re-focused on its original intent to describe, compose and configure software features (in terms of included source code or logic) and software feature dependencies. This is not so much required as an end of itself (Kconfig "conformance") but has the following practical advantages:
Configuring runtime software components, be it "in-memory" or as global runtime parameters, SHOULD be migrated to CT, especially such parameters that strictly belong to one of the CT abstract modeling concepts by normalization rules.
Backwards compatibility to deprecated Kconfig MAY be maintained as long as required as laid out in the requirements section.
Hardware vs. Software Configuration
The hardware vs. software distinction SHALL be dropped in favor of the following, more precise and easier-to-enforce rules:
Dependencies
Direct dependencies exist to Kconfig, DT and the settings-subsystem. Indirect dependencies exist to all configurable drivers or subsystems.
Concerns and Unresolved Questions
This section answers questions and evaluates concerns brought forward while discussing the aptitude of DT as a configuration source.
Concerns are responded to based on Zephyr-specific requirements and pragmatic engineering approaches, namely the concepts of data model normalization (similarly to 3NF for relational data models) and encapsulation/modularization.
Work-in-progress - please comment, I'll collect all concerns and questions here.
Is DT syntax capable to address all our software configuration requirements?
Yes. DT is just a tree of nodes with key/property values and references (phandles) that can easily be mapped via bindings to any primitive C type,
<stdint.h>
type, struct and pointer. Any normalized data model can obviously be mapped to DT. This should be good enough under all reasonable circumstances and is theoretically very well founded. We have a semantic modeling challenge before us, not a syntax or serialization challenge.Also compare the DTSpec archeology section below.
People don't like DT or cannot understand DT, DT is awkward.:
As we will not dispose of DT to use YAML everywhere, no matter how bad DT is, everyone who uses Zephyr has to know and work with it anyway. From a usability pov it doesn't matter what serialization we choose as long as we choose a single one, fix the quirks and document it well.
On a Linux box you have to deal with many different config files, too.
"Because Linux does it" is not requirement or engineering argument as such. We have no Zehyr-specific requirement that forces us to use many distinct config formats. There are good usability arguments that prefer an integrated approach. Note that this RFC favors distribution of configuration over many files (see the encapsulation/modularization argument), just not many distinct semantics and syntax variants.
We should probably start solving domain-specific problems.
We have a an obvious requirement to design something that can be extended to other subsystems plus can be integrated with the settings subsys, used for provisioning and be serialized to other formats like protobuf IDL or Thrift which we should not ignore. Above all we have to be able to serialize to any syntax based on some abstract conceptual data model.
A YAML-based solution is easier to understand and maintain.:
As laid out in the "Alternatives" section, a YAML-based solution is going to be a huge maintenance and documentation nightmare. We have to re-invent every wheel that has been invented for DT: type binding, inline documentation, integration with the doc system, mappings to macros, overlay mechanisms, naming patterns, etc. Just matching and syncing with the existing DT macrobatics will be a huge effort initially and over time. The problem we're facing is not syntax but semantics and the surrounding infrastructure and tooling.
It is easy to distinguish between HW and SW properties, that's how we should separate configuration.
It is not. This is a perceptual bias: We tend to confuse our internal models and heuristics with what is out there in the world. The reality is: We fight over each and every addition to DT because some say "it's SW" others say "it's HW". If not even we are able to precisely define the line between SW and HW, how will our users? If we have to explain to our users that what they find intuitive is wrong then we are wrong.
Devicetree was derived from the [...] Open Firmware project.
Nope. See DTSpec, section 1.2:
But it's not entirely wrong either as ePAPR itself was derived from the Open Firmware spec (aka IEEE 1275-1994).
DTSpec was designed to describe hardware only.
Why it is important to insist on ePAPR rather than Open Firmware as main DTSpec predecessor is that it was only the former that removed user configuration from DT and restricted its applicability to hardware due the changed focus on backing the Power ISA boot firmware then again re-generalized by DTSpec.
IEEE 1275-1994 specified allowable contents of the Device Tree in section 3.2 as:
Section 3.3.1 adds:
IEEE 1275-1994 had an
/options
root node specifically reserved to store such non-volatile user configuration which received a default at build time and could be updated at provisioning or runtime by the end user. So exactly the use case I'm envisioning for DT.Note: U-Boot uses DT for user configuration, too. They seem to have used the IEEE 1275-1994
/options
node first but now introduced a custom/config
node. Of course they are a bootloader, so they need less user config than an application development platform like Zephyr.Saying that DTSpec was designed to describe "hardware" is therefore at least misleading. DTSpec was designed to back OS-independent bootloaders, see DTSpec, section 1.1:
In other words: DT is a simple HAL but a HAL is of course as much influenced by its client as by the abstracted hardware itself. And Zephyr is not a bootloader nor is Linux. So the "abuse" (or as I'd say "pragmatic re-interpretation") started when focusing DTSpec on describing OS specific device abstractions to become vendor- and architecture independent which reversed the original intent of DTSpec, ie. abstracting OS differences away.
This shows that the simplified conventional wisdom "DT is for hardware only" has never been as "pure" as one might have thought and there is no need to protect its "purity" either. Such an argument proves nothing and should be replaced by requirements analysis: Being OS-independent was their requirement but it was never ours which explains why we never truly enforced it (e.g. in the build infrastructure) except for improved knowledge transfer from Linux (see below). Our main requirement always was vendor-agnosticism.
DTSpec is careful to introduce HW specifics in a separate section after laying out a general hierarchical key/value store with generic typing. We can trivially keep all HW specific parts out of nodes that don't need it by extracting
status
andcompatible
into a separate genericnode.yaml
binding file which will replacebase.yaml
for those nodes.In the end it doesn't even matter that much anyway. Our discussion re software/hardware is mostly academic: Structurally (i.e. by normalization criteria) the large majority of our subsystem config requirements map to existing device tree structures naturally (1-to-1 or 1-to-n). The remaining m-to-n related nodes can be isolated into top-level namespaces as inspired by IEEE 1275-1994 and referred to from inside the actual device-specific tree, see https://github.com/fgrandel/zephyr/blob/rfc/76902-systree-config/samples/net/sockets/echo/app.overlay as an example.
Wherever it made sense, we've tried to be compliant with Linux devicetree bindings.
This rule continues to be applied and even fortified by this RFC as laid out in the CT specification section. Not to ensure OS independence of Zephyr's DT (which never was a sensible requirement) but because it helps people who know Linux. They will find it easier to learn Zephyr which again is a real requirement of ours. Still we have deviated far enough from Linux (for good reasons) that it can hardly be argued that we're still "compatible" in any sensible way. That's why the above CT specification re-establishes and distinguishes much more precisely between Linux-compatible and Zephyr-specific DT parts.
The HW/SW split is "cleaner" or at some time in the past was "cleaner" than mixing up hardware and software properties in the same DT nodes.:
Our use of DT has broken basic data modeling practices from day one, namely normalization and encapsulation. Both are precisely defined design rules:
Our DTS and bindings are mostly kept far apart from usage sites instead. We have invented Zephyr-specific (but vendor agnostic) "hardware properties" that neither exist in datasheets nor in Linux and put them where the hardware lives based on imprecise ontological assumptions of what is "hardware". This is wrong: By DDD rules and Conway's law we should know that any context-agnostic ontology is doomed to fail. And by the encapsulation argument we should place Zephyr-driver-specific DT snippets near the drivers that use them exclusively while keeping shared concerns at as central a place as required but still as local as possible.
Further de-normalized and de-modularized configuration will inevitable lead to more modeling inconsistencies and less readability/maintainability in practice as the model is not self-validating and consistency cannot be automatically enforced with sensible effort (examples of which abound in our own partially de-normalized DT variant today).
We have to distinguish between the global conceptual data model and its local physical representation instead. YAML doesn't determine a data model. But the model is much more relevant to usability and maintainability than the syntax. This shows how far our discussion has strayed from the real problem so far. Zephyr is an application development platform, as such application architecture concepts are to be applied.
While DT has been promoted as a great solution to many problems, to me, it has several drawbacks on the way it is implemented in Zephyr.
This is true. It is due to Zephyr-specific architectural and implementation deficiencies (many of which have been laid out in this RFC) that our use of DT feels awkward. Not due to its syntax. This can be fixed.
If I had to start Zephyr again, I'd probably stay away from DT.
Maybe, but that's not an option in practice.
If we start diverging from [DT], we either define our own spec, or it'll just organically grow into a mess.
True. This is why CT is specified much more precisely than our current use of DT while acknowledging additional practical requirements that had not been systematically covered by DT so far.
DT and DT bindings have come a long way. Lets focus our resources on making DT more intuitive by fixing a few "quirks" rather than starting from zero because this will immediately benefit us doubly: on the hardware and on the software modeling side. As soon as the cracks in the YAML approach are inevitably going to appear everyone will wish that we had not opened another Pandora's box.
Alternatives
An alternative, separate YAML-based approach has been considered and rejected in this RFC for the following reasons:
The settings subsystem was considered as an exclusive configuration target but was then conceived as optional part of this more general RFC because it would be lacking as a general configuration subsystem as laid out in the "Detailed RFC" section.
Thrift and protobuf were proposed as exclusive configuration sources but were then conceived as optional part of this more general RFC as convergence could hardly be achieved in the community to a single binary source. Apart from that all arguments listed under the YAML approach apply to these source serializations as well.
Kconfig-based approaches are not adequate due to Kconfig's structural limitations as laid out in the "Detailed RFC" section.
The text was updated successfully, but these errors were encountered: