Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for Sequence, Map, and Array Decomposition #8

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

rhdunn
Copy link

@rhdunn rhdunn commented Oct 18, 2018

No description provided.

@ChristianGruen
Copy link
Member

+1 for the proposal, looks good.

If tuple arrays are returned, I would be in favor of having the array syntax. This would make it easier to process sequences of arrays:

let ($array1, $array2) := ([1,2], [1,2])
return ...

I guess it shouldn’t apply to context item declarations? I’m not sure either if it makes sense for group by clauses.

Copy link
Member

@michaelhkay michaelhkay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's necessary to go further and define tuples as a type: using sequences and arrays to represent tuples of values has many problems and this proposal only solves one of them. Given the introduction of maps, I think that it's better to represent tuples of values as maps (as the standard function library often does in the case of the "options" parameters of functions), and this proposal doesn't allow decomposing assignment in this case.

Note that in the introduction, the terms "fixed length sequence" and "fixed length array" are confusing, because it suggests that there are sequences and arrays whose length is not fixed. This is not the case; a sequence (and an array) have a length which is an intrinsic property of the sequence and can never be changed, since sequences are immutable.

@rhdunn
Copy link
Author

rhdunn commented Oct 19, 2018

I plan on defining a separate proposal for defining the type of a tuple sequence using the formal semantics style syntax -- as (xs:string, xs:string). This specific proposal is about sequence/array decomposition, which is separate from defining specific types for them -- i.e. either proposal can be accepted or rejected, and other additional proposals can be introduced like a proposal for defining a syntax for tuples based on maps.

I don't think that just because maps are available, syntax and extensions to support sequences and arrays should not be proposed. There are existing functions that make use of fixed length sequence values, and it may be easier to write functions that accept/return sequences/arrays than maps. I've listed some examples in my proposal (sincos, muldiv, points, and complex/rational numbers).

What I meant by "fixed length" sequence/array is where the size of the sequence/array does not change depending on context. For example, a 2D point will always have two items. A counter example would be a function that doubles the values of a sequence -- the length of the sequence here is variable (not fixed). I'm happy to use different terminology if the terms in this proposal are confusing.

Having said that, provided that there are a matching number of items in the sequence/array and variables being assigned to that, it should not matter.

The behaviour of assigning more variables than there are items in the sequence/array is defined in the proposal. The proposal should define the behaviour of assigning fewer variables than there are items in the sequence/array.

Providing a proposal for decomposing map based tuples (named tuples?) is something I would be interested in, but should be a separate proposal. As a rough idea, using a map-like syntax analogous to declaring maps, something like:

let { x: $x, y: $y as xs:double } := { x: 2.0, y: 3.0 }
return ...

@adamretter
Copy link
Member

adamretter commented Oct 24, 2018

I am watching this with interest.

From my perspective we don't necessarily need tuple sequence types or tuple array types, rather I see the decomposition as just syntactic sugar.

I do like the idea of varying syntax for sequence and array, e.g.:

let ($x, $y) := (1.1, 2.2)
let [$x, $y] := [1.1, 2.2]

@ChristianGruen
Copy link
Member

I agree with Adam’s point of view: I regard the extension of the syntax as a nice addition, but I could live without new tuple types.

@rhdunn
Copy link
Author

rhdunn commented Oct 24, 2018

To be clear, this specific proposal is not intending to add any new types. It is just about the decomposition of sequence and array values. I will update the proposal to make this clearer, and to add a section for decomposition of map values. I'll also rename the file and pull request to reflect these changes.

@adamretter
Copy link
Member

@michaelhkay how do you feel about this PR just being syntactic sugar for the time being?

@michaelhkay
Copy link
Member

michaelhkay commented Oct 25, 2018

I think the let () := and let [] := syntax is fine in principle.

Need to see the detailed semantics, e.g. for the case where the sequence/array has a different number of items from the number of variables.

There are also some syntax details to sort out:

  • adding "let" to the list of reserved function names (so that let() works).
  • let[$x, $y] := is another case that requires infinite lookahead, because let[$x, $y] is a valid XPath expression in its own right.
  • And it seems odd to resolve let() by reserving the function name while relying on lookahead to resolve let[].

The extension to maps/tuples doesn't work for me. The proposed syntax offers no benefits over let $x := $m?x, $y := $m?y return .... And in any case, the lookup syntax is sufficiently terse that I don't think you often need to bind variables to each component of a map/tuple in this way.

@michaelhkay
Copy link
Member

michaelhkay commented Oct 25, 2018

It's not very pretty, but the following would parse more cleanly:

  • let $(x, y, z) := 1 to 3 return ...
  • let $[x, y, z] := array{1 to 3} return ...

and then perhaps map/tuple assignment could be

  • let ${x, y, z} := $map (binding named components of the tuple to variables of the same name)

@adamretter
Copy link
Member

@michaelhkay I actually prefer your new syntax, less $'s to type

@michaelhkay
Copy link
Member

Here's a suggestion for the semantics:

  • let $(a, b, c, ...) := EXPR return EXPR2

Amend the existing text:

If a let clause contains multiple variables, it is semantically equivalent to multiple let clauses, each containing a single variable. In particular:

(a) the clause

let $x := $expr1, $y := $expr2

is semantically equivalent to the following sequence of clauses:

let $x := $expr1
let $y := $expr2

(b) a sequence-decomposition let $(x, y, z, ...) := expr is equivalent to the following sequence of clauses:

let $x := expr[1]
let $y := expr[2]
let $z := expr[3]
...

(but the expression expr is only evaluated once)

If the sequence contains more items than the number of variables being bound, excess items are ignored. If the sequence contains fewer items than the number of variables being bound, excess variables are bound to an empty sequence.

(c) an array-decomposition let $[x, y, z, ...] := expr is equivalent to the following sequence of clauses

let $x := expr?1
let $y := expr?2
let $z := expr?3
...

(again, the expression expr is only evaluated once)

A type error [XPTY0004] is raised if the result of evaluating expr is not an array. A dynamic error is raised [FOAY0001] if the array contains fewer members than the number of variables being bound. If the array contains more members than the number of variables being bound then excess members are ignored.

(d) a map-decomposition let ${x, y, z} := expr is equivalent to the following sequence of clauses, in the case where x, y, and z are simple NCNames.

let $x := expr?x
let $y := expr?y
let $z := expr?z
...

(again, the expression expr is only evaluated once)

In the case where the variable name is a QName q, the equivalence is let $q := expr?(xs:QName("q")).

A type error [XPTY0004] is raised if the result of the expression is not a map. [[Assuming map-based tuples are introduced], a type error [XPTY0004] MAY be raised if the processor is able to establish that the static type of expr is a tuple type and that x (etc) is not one of the permitted key names for that tuple type.] In other cases, if the map does not contain an entry with the specified key, the corresponding variable is bound to an empty sequence. Unreferenced entries in the map are ignored.

@ChristianGruen
Copy link
Member

This looks sound and solid.

Just one thing: Maybe we should not simply ignore returned values that cannot be bound but rather raise an error. Swallowed data may result in erroneous code.

Thinking more about this, maybe we should indeed find different solutions for sequences, arrays and maps, as the three data structures have different semantics anyway:

  • For sequences, it would feel more natural to me to bind all remaining items to the last variable, and never raise any errors.
  • For arrays, which have fairly strict boundary semantics in XQuery, I would expect an error if too few or too many items are returned.
  • For maps, unreferenced values could be ignored indeed, as the proposed solution reminds of a map lookup.

@michaelhkay
Copy link
Member

michaelhkay commented Oct 25, 2018

Yes, I toyed with allowing let $(head, tail) := sequence which certainly has some nice use cases. This also means that let $(x) := expr means the same as let $x := expr which is logical. But should arrays work the same way? That's tricky because you want let $[x, y] := [1,2] to set $x=1, $x=2, not $x=1, $y=[2]. So you end up with an asymmetry between sequences and arrays. (You suggested requiring the number of variables to exactly match the array size. That feels a bit severe to me.)

@ChristianGruen
Copy link
Member

Maybe we can think about use cases for which ignoring returned results is a better solution than raising an error? In other words, when does a user create results and expect parts of it to be ignored?

I have some sympathy for the assymetry between arrays and sequences, as the data structures are assymetric one way or the other (mostly because of the decision in XQ31 that a supplied array index must be larger than 0 and must not exceed the array size). Moreover, my impression is that arrays and sequences are used quite differently in practice. As arrays are not implicitly flattened, it would possibly come as a surprise if we created something like a tail result for arrays.

However, we could also provide explicit semantics for binding the tail of a sequence or even an array to the last variable. In Python, *var is used, in JavaScript it seems to be ... var. The function parameter syntax that we are discussing in parallel may be a better choice for us:

let $(head as xs:string, tail as xs:string*...) := ('head', 't', 'a', 'i', 'l')
return string-join($tail)

@rhdunn rhdunn changed the title Proposal for Tuple Sequence and Array Decomposition Proposal for Sequence, Map, and Array Decomposition Oct 28, 2018
@rhdunn
Copy link
Author

rhdunn commented Oct 28, 2018

I have updated the proposal to address the above feedback, use the new syntax, and add a possible grammar. The revised text is viewable at https://github.com/expath/xpath-ng/blob/bc6cb1b579d688ba0088abfe0e73b7e633f964aa/sequence-map-array-decomposition.md (this includes the TypeDeclaration parsing fix pushed below).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants