Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large number of allocations in 0.6.8 #228

Open
aplavin opened this issue Jun 1, 2022 · 4 comments
Open

Large number of allocations in 0.6.8 #228

aplavin opened this issue Jun 1, 2022 · 4 comments

Comments

@aplavin
Copy link
Member

aplavin commented Jun 1, 2022

The eager automatic conversion introduced in #227 (cc @timholy @piever) leads to many allocations when this conversion isn't actually needed.
For example:
StructArrays@0.6.7 - 25 allocations:

julia> using StructArrays, ArraysOfArrays

julia> A = StructArray(x=VectorOfVectors{Int}())

julia> @time for _ in 1:10^6 push!(A, (x=1:10,)) end
  0.307580 seconds (25 allocations: 11.269 MiB, 7.45% gc time)

StructArrays@0.6.8 - 1 million allocations:

julia> using StructArrays, ArraysOfArrays

julia> A = StructArray(x=VectorOfVectors{Int}())

julia> @time for _ in 1:10^6 push!(A, (x=1:10,)) end
  0.809896 seconds (1.00 M allocations: 293.601 MiB, 55.02% gc time)

Of course, these allocations go away if automatic conversion (

maybe_convert_elt(::Type{T}, vals) where T = convert(T, vals)
maybe_convert_elt(::Type{T}, vals::Tuple) where T = T <: Tuple ? convert(T, vals) : vals # assignment of fields by position
maybe_convert_elt(::Type{T}, vals::NamedTuple) where T = T<:NamedTuple ? convert(T, vals) : vals # assignment of fields by name
) is disabled by StructArrays.maybe_convert_elt(::Type{T}, vals::Tuple) where {T} = vals.

This change in 0.6.8 is highly breaking performance-wise. Can the new automatic conversion approach be revised and made less eager somehow?

aplavin added a commit to aplavin/StructArrays.jl that referenced this issue Jun 1, 2022
@timholy
Copy link
Member

timholy commented Jun 26, 2022

ArraysOfArrays is fundamentally broken: JuliaArrays/ArraysOfArrays.jl#2. That can cause all sorts of trouble.

@aplavin
Copy link
Member Author

aplavin commented Jun 27, 2022

Interesting, you mean the x[1] <: eltype(x) not holding in ArraysOfArrays? Indeed, that's unfortunate, didn't notice before.

However, this issue is independent and will remain even when the ArraysOfArrays eltype is fixed so that x[1] <: eltype(x).
ArrayOfArrays can push! any iterable, such as a range, directly: it's significantly more efficient than materializing the iterable to an Array. Meanwhile, StructArrays eagerly attempts to convert the passed value to the eltype of component arrays.
I believe the latter is the actual issue, StructArrays shouldn't decide for all underlying arrays that an explicit conversion is needed.

@piever
Copy link
Collaborator

piever commented Jun 29, 2022

I am also realizing that things are problematic for array types where the result of indexing is some custom type that has a reference to data from the original array, For example the same happens with StringArray{WeakRefString} from WeakRefStrings:

julia> using WeakRefStrings

julia> sa = StringArray{WeakRefString}(["x", "y"])
2-element StringVector{WeakRefString}:
 "x"
 "y"

julia> push!(sa, "z")
3-element StringVector{WeakRefString}:
 "x"
 "y"
 "z"

julia> push!(sa, convert(eltype(sa), "z"))
ERROR: MethodError: no method matching WeakRefString(::String)

CategoricalArrays have the same issue: you can push! things that cannot be converted to the eltype of the array.

I imagine one would need to pick up between

  1. avoid the conversion in StructArrays, so pushing e.g. 1 to a StructArray{ComplexF64} goes back to erroring
  2. drop support for all array types where you can push! values that cannot be converted to the eltype

Neither choice seems great, but I don't really see a clean way out of this. Maybe something along the lines of #229 is a reasonable compromise, but even then things are only fixed when the entry is passed as a tuple or named tuple.

@aplavin
Copy link
Member Author

aplavin commented Jun 17, 2023

The reason is that push! does not recurrently call push! if some columns are StructArrays, but the whole thing is unrolled, for performance reasons.

Maybe at some point this unrolling wouldn't be required? Maybe even at current julia versions? :)
Not sure about exact benchmark, but this can be a way forward to fixing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants