Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

byrow (sum) on a column containing vectors of numbers #52

Open
sprmnt21 opened this issue Apr 3, 2022 · 3 comments
Open

byrow (sum) on a column containing vectors of numbers #52

sprmnt21 opened this issue Apr 3, 2022 · 3 comments

Comments

@sprmnt21
Copy link

sprmnt21 commented Apr 3, 2022

I don't explain the reason for the following differences

julia> modify(compare(ds[!, r"lim"], ds[!, Not(r"lim")], on = 1:3 .=> 1:3, eq = !isless), 1:3=>byrow(x->x.*1))
6×4 Dataset
 Row │ a_lim=>a  b_lim=>b  c_lim=>c  row_function 
     │ identity  identity  identity  identity
     │ Bool?     Bool?     Bool?     Array…?
─────┼────────────────────────────────────────────
   1 │     true      true     false  [1, 1, 0]
   2 │     true     false      true  [1, 0, 1]
   3 │     true      true      true  [1, 1, 1]
   4 │    false      true     false  [0, 1, 0]
   5 │    false     false     false  [0, 0, 0]
   6 │     true     false     false  [1, 0, 0]

julia> modify(modify(compare(ds[!, r"lim"], ds[!, Not(r"lim")], on = 1:3 .=> 1:3, eq = !isless), 1:3=>byrow(x->x.*1)),4=>byrow(x->sum(x)))
6×4 Dataset
 Row │ a_lim=>a  b_lim=>b  c_lim=>c  row_function 
     │ identity  identity  identity  identity
     │ Bool?     Bool?     Bool?     Int64?
─────┼────────────────────────────────────────────
   1 │     true      true     false             2
   2 │     true     false      true             2
   3 │     true      true      true             3
   4 │    false      true     false             1
   5 │    false     false     false             0
   6 │     true     false     false             1

julia> modify(modify(compare(ds[!, r"lim"], ds[!, Not(r"lim")], on = 1:3 .=> 1:3, eq = !isless), 1:3=>byrow(x->x.*1)),4=>x->sum.(x))
6×4 Dataset
 Row │ a_lim=>a  b_lim=>b  c_lim=>c  row_function 
     │ identity  identity  identity  identity
     │ Bool?     Bool?     Bool?     Int64?
─────┼────────────────────────────────────────────
   1 │     true      true     false             2
   2 │     true     false      true             2
   3 │     true      true      true             3
   4 │    false      true     false             1
   5 │    false     false     false             0
   6 │     true     false     false             1

julia> modify(modify(compare(ds[!, r"lim"], ds[!, Not(r"lim")], on = 1:3 .=> 1:3, eq = !isless), 1:3=>byrow(x->x.*1)),4=>byrow(sum))
6×4 Dataset
 Row │ a_lim=>a  b_lim=>b  c_lim=>c  row_function 
     │ identity  identity  identity  identity
     │ Bool?     Bool?     Bool?     Array…?
─────┼────────────────────────────────────────────
   1 │     true      true     false  [1, 1, 0]
   2 │     true     false      true  [1, 0, 1]
   3 │     true      true      true  [1, 1, 1]
   4 │    false      true     false  [0, 1, 0]
   5 │    false     false     false  [0, 0, 0]
   6 │     true     false     false  [1, 0, 0]

@sl-solution
Copy link
Owner

byrow is fine tuned for a set of functions and operations (see its docstring for more details). For generic functions, byrow assumes the passed function accepts the row as a vector of values, and x->x .* 1 falls in this category, see ?byrow(x->x .* 1)

@sprmnt21
Copy link
Author

sprmnt21 commented Apr 3, 2022

Thanks.

Here https://docs.juliahub.com/InMemoryDatasets/cS87e/0.6.10/man/byrow/#User-defined-operations here I read that

For user defined functions which return a single value, byrow treats each row as a vector of values, thus the user defined function must accept a vector and returns a single value.

So in the case of the function byrow(x->x .* 1), I understand that the single value is a vector. That is, that the vector, resulting from function, is intended as a single value.
This explains the result of applying the sum function.
In fact sum ([[1,2,3]]) = [1,2,3].
But so, I can't explain to myself why byrow (x-> sum(x)) seems to work instead.

While the situation of x-> sum.(x) is really different.

@sl-solution
Copy link
Owner

sl-solution commented Apr 3, 2022

But so, I can't explain to myself why byrow (x-> sum(x)) seems to work instead.

This is something that I should add to the documentation. "byrow with a generic function and a single column acts like fun.(col)." docstrings fixed in master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants