Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String formaters #284

Open
captain-yoshi opened this issue Jul 27, 2022 · 1 comment
Open

String formaters #284

captain-yoshi opened this issue Jul 27, 2022 · 1 comment

Comments

@captain-yoshi
Copy link
Contributor

It seems that strings containing a reserved indicator - will be formatted with enclosed single quotes as recommended here.

# yaml-cpp
- dataset:
    name: Collision Check
    type: COLLISION CHECK
    date: 2022-Jul-21 12:22:31.228645
    uuid: 6ba693ee_78b2_4ea2_bbac_284fcc08d909
    hostname: captain-yoshi

# ryml
- dataset:
    name: Collision Check
    type: COLLISION CHECK
    date: '2022-Jul-27 04:28:14.908901'
    uuid: bc2aaae8_c539_40bc_854b_3058eda86502
    hostname: 'captain-yoshi'

This post suggest this with references from the v1.2.2 specification:

Double-quoted style:

The double-quoted style is specified by surrounding " indicators. This is the only style capable of expressing arbitrary strings, by using \ escape sequences. This comes at the cost of having to escape the \ and " characters.

Single-quoted style:

The single-quoted style is specified by surrounding ' indicators. Therefore, within a single-quoted scalar, such characters need to be repeated. This is the only form of escaping performed in single-quoted scalars. In particular, the \ and " characters may be freely used. This restricts single-quoted scalars to printable characters. In addition, it is only possible to break a long single-quoted line where a space character is surrounded by non-spaces.

Plain (unquoted) style:

The plain (unquoted) style has no identifying indicators and provides no form of escaping. It is therefore the most readable, most limited and most context sensitive style. In addition to a restricted character set, a plain scalar must not be empty or contain leading or trailing white space characters. It is only possible to break a long plain line where a space character is surrounded by non-spaces. Plain scalars must not begin with most indicators, as this would cause ambiguity with other YAML constructs. However, the :, ? and - indicators may be used as the first character if followed by a non-space “safe” character, as this causes no ambiguity.

TL;DR

With that being said, according to the official YAML specification one should:

Whenever applicable use the unquoted style since it is the most readable.
Use the single-quoted style (') if characters such as " and \ are being used inside the string to avoid escpaing them and therefore improve readability.
Use the double-quoted style (") when the first two options aren't sufficient, i.e. in scenarios where more complex line breaks are required or non-printable characters are needed.

Does single quoting every reserved indicator reduce complexity and increase performance for ryml at the cost of readability (not a huge lost) ? I am curious about how strings are formatted in ryml.

@biojppm
Copy link
Owner

biojppm commented Aug 1, 2022

The current emitter does scan the scalar strings and picks styles based on somewhat serendipitous heuristics. These heuristics could be improved; in this particular case it is likely that the use of - within the string is benign and has no special reason to be quoted. I'd have to look into it, and I'd be happy to accept a PR addressing this.

But in the end this boils down to efficiency of emitting vs responsibility for sanity of the emitted YAML. Should it be the user or the emitter that is responsible for the style choices? This is an issue that has plagued me from the start.

Ideally, I'd like to be able to emit without any emit-time scan of the string. That would require a feature mask for the scalar contents, somewhat like the current _WIP style flags. But while this would have no impact for trees obtained from parsing (because the parser can easily set up these flags), it would require programatic trees to come with the proper flags, which would require the client to set up the flags. The current situation works but is less than ideal.

Maybe this could be as simple as adding a per-node off-switch for emit-time heuristics. Or emitter flags. Or both per-node and emitter flags. One thing is clear: much more work is needed in emitting styles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants