Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add section with design choices #105

Merged
merged 4 commits into from
Jul 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 150 additions & 0 deletions docs/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# Design choices

## Using lists instead of mappings

One of the most controversial choices we have made in QREF is the choice of
using lists instead of mappings (a.k.a. dictionaries) for objects that
should have an unique name. This choice affects:

- Children of a `Routine`.
- Ports of a `Routine`.

For instance, why did we chose to represent ports like this:

```yaml
ports:
- {"name": "in_0", "direction": "input", "size": 2},
- {"name": "out_0", "direction": "output", "size": "N"}
```

instead of this:

```yaml
ports:
in_0: {"direction": "input", "size": 2}
out_0: {"direction": "output", "size": "N"}
```
?

The answer is purely pragmatic. On the one hand, using mappings
instead of lists would guarantee uniqueness of names of ports
and children. But, on the other hand, it would give a false
sense of security. To see why, consider the following example
Python code, which loads an incorrect definition of ports:

```python
import yaml

data = """
ports:
in_0: {"direction": "input", "size": 2}
in_0: {"direction": "input", "size": "N"}
"""

print(yaml.safe_load(data))
```

If you are new to parsing YAML (or JSON) in Python you might be
surprised that the code runs at all - after all shouldn't keys
in YAML mappings be unique? Well they should, but most parsers
will just load the last key if the duplicates are present. The
code above prints:

```text
{'ports': {'in_0': {'direction': 'input', 'size': 'N'}}}
```

So, suppose you are editing a QREF file. You made a mistake and
used the same port name twice. If we used mapping, your file would
load fine, but you would lose one entry. Importantly, from your
code's perspective you wouldn't even know - you would just get
a dictionary with unique keys. Debugging problems that could arise
in this way would be a nightmare, and we want to save all of us
such hurdles.

Now, what happens if we use lists? Let's try the following code:

```python
import yaml

data = """
ports:
- {"name": "in_0", "direction": "input", "size": 2}
- {"name": "in_0", "direction": "input", "size": "N"}
"""

print(yaml.safe_load(data))
```

This time, we get the following output:
```text
{'ports': [{'name': 'in_0', 'direction': 'input', 'size': 2}, {'name': 'in_0', 'direction': 'input', 'size': 'N'}]}
```
Now, we have ports with duplicate names, which is not good, but we
are able to detect this and react e.g. by raising exception.
No information was lost.

The natural question to ask is: why won't we use dictionaries,
and handle duplicate keys by customizing YAML (or JSON) parsers?
We could do this, but keep in mind that QREF is mainly a
data format. Different users can use different parsers and we
want to make sure everyone gets consistent results no matter
what parsing library they use.


## Why isn't routine a top level object?

The actual program you are representing in QREF is stored
as `program` property of a top-level schema object, i.e.

```yaml
version: v1
program:
name: my_program
children:
# ...
```
You might wonder why wouldn't we just make the program a
top-level object, or, in other words, why don't QREF
documents look like this instead:

```yaml
version: v1
name: my_program
children:
# ...
```

There are essentially two reasons, but both of them
have to do with the fact that we wanted our structure
to be recursive.

To quickly explain what recursive in this context means,
let's think about `program` and its `children`. Both of
them can be viewed as objects of the same type. The
`program` is a *routine* which can contain other routines
as its `children`. Each of the children can have
other routines as their children etc. It's routines all
the way down.

Now, if we used a top-level object to represent a program,
we face a problem - we need to include version only in
this object and not in the children. Therefore we would
have to create a separate type for the main program
and separate one for subroutines. This would have the
following consequences, listed in the order of importance:

- The JSON schema would no longer be recursive, and thus
would contain more definitions, which simply means
it would be more complex and harder to read.
- The hierarchy of our Pydantic models would get slightly
more complex.

We don't mind putting effort into creating more complex
hierarchy of models if it would add usability. However, we
think that having the JSON schema as simple as possible
is beneficial to the users (especially considering that
QREF is mostly a data format!), and thus we made a choice
of using the recursive type hierarchy.


1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ nav:
- qref.schema_v1: library/reference/qref.schema_v1.md
- qref.experimental.rendering: library/reference/qref.experimental.rendering.md
- development.md
- design.md
theme:
name: material
palette:
Expand Down
Loading