Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unwrapping nested structs performance problem #122

Closed
MasonProtter opened this issue Mar 11, 2020 · 2 comments
Closed

unwrapping nested structs performance problem #122

MasonProtter opened this issue Mar 11, 2020 · 2 comments

Comments

@MasonProtter
Copy link
Contributor

Currently, if you're using the unwrap functionality #85, it can impose a quite heavy performance cost, e.g.

using StructArrays, ForwardDiff
structify(x) = StructArray(x, unwrap= T -> T<:Union{Tuple,ForwardDiff.Partials})
let
    n, m = 50, 51
    A = ForwardDiff.Dual.(randn(n, m), randn(n, m), randn(n, m))
    @btime StructArray($A) # 78.440 μs (4595 allocations: 171.56 KiB)
    @btime structify(  $A) # 316.057 μs (14807 allocations: 450.75 KiB)
end;

Naïvely, it seems that the unwrapping shouldn't add too much overhead if it's done intelligently, since it's the same amount of data to traverse and copy, but maybe that's incorrect. Any thoughts? Is there something that can be done to make this faster?

@piever
Copy link
Collaborator

piever commented Jul 14, 2020

Sorry it took me such a long time to look into this!

I'm not too sure what's going on, I suspect it may be somehow related to #86, where nested things can fail on the gpu in some cases. There is something fishy going on, as the structify version does not even infer correctly. In terms of "operations", it should be entirely analogous, as it is done in one pass, I think it's more an issue of julia failing to generate good code in the nested case.

This does not make any sense to me:

julia> n, m = 50, 51
(50, 51)

julia> A = ForwardDiff.Dual.(randn(n, m), randn(n, m), randn(n, m));

julia> S = structify(A);

julia> @btime for i in eachindex($A)
           $S[i] = $A[i]
       end
  91.066 μs (10200 allocations: 278.91 KiB)

Why would it allocate? The unnested case does not allocate and is much faster.

Hopefully I'll have time to investigate in the next few days. The main function used here is foreachfield (see https://github.com/JuliaArrays/StructArrays.jl/blob/master/src/utils.jl#L51), that is a generated function to access all fields of the struct recursively and apply a function with those as argumentsl. Maybe it can be tweaked a bit to improve in this scenario.

A thing that I'm curious to try is to have foreachfield work on integers rather than on symbols, and be a regular (not generated) function. I'm curious whether that can rescue the nested case.

@piever
Copy link
Collaborator

piever commented Jul 17, 2020

Yes, the problem is indeed something that was pointed out a while ago. foreachfield ends up doing some recursion at runtime, rather than code generation time, which is quite bad for performance. I'm changing that in #141. There, I get:

julia> let
           n, m = 50, 51
           A = ForwardDiff.Dual.(randn(n, m), randn(n, m), randn(n, m))
           @btime StructArray($A) # 56.916 μs (4594 allocations: 171.55 KiB) 
           @btime structify(  $A) # 62.633 μs (4604 allocations: 171.84 KiB) 
       end;

Before merging #141 I need to check if it also handles the other issue we had with GPU code generation, which should be related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants