Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When parsing, is there a way to know the "real type" of a scalar ? #315

Open
JonathanGirardeau opened this issue Sep 28, 2022 · 6 comments

Comments

@JonathanGirardeau
Copy link

Hello,
I have not started to use this library but it seems very interesting.
I would like to know if it is possible to check the "real type" of a scalar.
For example in this YAML :

hello: 1234
world: "1234"

When I parse it, what API of the library can I call to find out the value of hello field is a number and the value of world field is a string ?

@biojppm
Copy link
Owner

biojppm commented Sep 28, 2022

The "real type" of a scalar is a question for which the answer must depend on the application. There is no final answer that is context-free. Eg, is nan a number or a string? Or should a representation of a 2D vector as (0.5,0.714) be treated as a number or as a string? Or consider an enum, or even a plain number that may actually be a string, as you show in the example above. All these questions are application-dependent.

Having said that, if you have no context and therefore don't know the type of a node before deserializing, ryml also gives you a toolbox in the tree, in the node and in csubstr, that can be used to figure out some information:

Tree t = parse_in_arena(R"(
hello: 1234
world: "1234"
)");
// does the node have a val?
assert(t["hello"].has_val());
assert(t["world"].has_val());
// does the val compare with a string?
assert(t["hello"].val() == "1234");
assert(t["world"].val() == "1234");
// is the val quoted?
assert(t["hello"].is_val_quoted() == false);
assert(t["world"].is_val_quoted() == true);
// does it look like a number (real or integer or unsigned)?
assert(t["hello"].val().is_number() == true);
assert(t["world"].val().is_number() == true);
// see also csubstr::is_integer(), csubstr::is_real(), etc

HTH.

@biojppm
Copy link
Owner

biojppm commented Sep 28, 2022

If you have a more concrete question of a problem you're trying to address, I'd be happy to help.

@biojppm
Copy link
Owner

biojppm commented Sep 30, 2022

Closing now, feel free to reopen if there are more questions.

@biojppm biojppm closed this as completed Sep 30, 2022
@sergio-eld
Copy link

Would be nice to have this mentioned in quickstart. Usually users expect a built-in straight-forward way to get the type of the value within the node. Yaml data types are documented.
In case of this library I had to read through all the quickstart.cpp, then look into definitions (because I didn't find it in quickstart), and at the end resort to gitub issues...
It is also confusing, since val() returns a c4 basic_string_view which is a utility, used by the library. Hence no one would expect to look-up yaml/json related methods there. One would expect methods like is_bool, is_real, is_null, etc. as part of the yaml library itself.

@biojppm
Copy link
Owner

biojppm commented Jul 7, 2024

Yaml data types are documented

Do you mean data types or do you mean tags?

If you mean tags instead of data types, then YAML does indeed have several basic, common tags such as !!str, etc. So does rapidyaml.

OTOH, if you do mean data types, there exist only three YAML data types: seqs, maps or scalars. Scalars are string-derived values, and the spec is clear that the meaning of an untagged node is application specific:

In YAML, untagged nodes are given a type depending on the application.

Specifically, for untagged nodes,

If a document contains unresolved tags, the YAML processor is unable to compose a complete representation graph. In such a case, the YAML processor may compose a partial representation, based on each node’s kind [...]

The node's kind is of course only one of seq,map,or scalar.

So if you want to infer what a scalar's type is based on its string representation, that is purely a string method, and the helpers are there for that reason; YAML does not and can not specify how an untagged scalar should map to a type.

If OTOH you want to resolve tags, there are ample facilities in the library to achieve that.


Having said that, and to ensure that there is clear understanding, can you provide an example of your application code? What is it that you're trying to do, and how would you like to get it done?

@biojppm biojppm reopened this Jul 7, 2024
@sergio-eld
Copy link

I forgot to mention that I'm using the library to parse json. Indeed, in case of yaml, it is rather hard to distinguish between the types without tags.
I am successfully using the string functions. And the point of my comment is that it would be nice to have those functions mentioned in quickstart with json examples

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants