Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MutableArithmetics #113

Open
nsajko opened this issue Jun 9, 2024 · 10 comments
Open

MutableArithmetics #113

nsajko opened this issue Jun 9, 2024 · 10 comments

Comments

@nsajko
Copy link

nsajko commented Jun 9, 2024

https://github.com/jump-dev/MutableArithmetics.jl

This (hopefully) will make it very easy to handle temporaries and increase speed

After a while of trying to learn the very not well documented API for this interface, I found this only works for simple arithmetic (+,-,*,/) and no other functions, therefore I think it we will have to write our own temporary management. One idea I have is to have a macro @usetemps which appends some symbol to each of the operators ( similar to @. ) and converts the product of the expression to heap allocated TPS or ComplexTPS. Those special operators with appended symbols return temporary types which have special behaviors / avoid heap allocation if possible. If a temporary is included in the evaluation, it can be reused or mutated.

Originally posted by @mattsignorelli in #46 (comment)

It's not true that MA "only works for simple arithmetic", MA works for arbitrary functions. If you have questions I'd be happy to help.

@mattsignorelli
Copy link
Contributor

mattsignorelli commented Jun 9, 2024

The function I'm looking for from MutableArithmetics is @rewrite, which I didn't think worked for arbitrary functions from the discussion here: jump-dev/MutableArithmetics.jl#253

From a developer in that discussion:

Does MutableArithmetics not work for general functions, only for the standard arithmetic operations (+,-,/,*)?

Correct the @rewrite macro works only for a limited subset of the standard arithmetic operations.

@nsajko
Copy link
Author

nsajko commented Jun 9, 2024

I usually avoid @rewrite anyway, because it's too high-level. For optimal performance it's necessary to use operate!! etc. directly.

@mattsignorelli
Copy link
Contributor

I usually avoid @rewrite anyway, because it's too high-level. For optimal performance it's necessary to use operate!! etc. directly.

I see. Yes the thought was to have a macro users could prepend to expressions containing the mutable struct, so that a pre-allocated temporary buffer could be used instead for the temporaries generated, while keeping the syntax simple and transparent to other types. The current @FastGTPSA macro I implemented does this, albeit with some weird tricks.

At least at the time, I couldn't see any major benefits to satisfying the MutableArithmetics interface other than for @rewrite, and had difficulty understanding how to do so. What other benefits would satisfying the interface offer?

@nsajko
Copy link
Author

nsajko commented Jun 9, 2024

Looking at the @FastGTPSA docs you linked (thanks!), it seems like it relies on a global TPS vector for avoiding allocations? I guess that makes thread-safety impossible or at least expensive, which seems like a pretty big drawback?

Another thing that seems like a drawback of the @FastGTPSA approach is that, as far as I understand, it can only work for hardcoded TPS types, because a global TPS vector is required for each TPS type supported by @FastGTPSA, right?

At least at the time, I couldn't see any major benefits to satisfying the MutableArithmetics interface other than for @rewrite, and had difficulty understanding how to do so. What other benefits would satisfying the interface offer?

It'd allow users to, e.g., move allocations outside of loops or outright eliminate them in a generic and thread-safe manner, but at the cost of uglier and less convenient code.

Take this loop as an example:

for x  vec
    s += f(x)
end

Assuming f needs to allocate its return value, and + perhaps also allocates, it might make sense to implement operate! for +, and operate_to! for f. Then these allocations could be eliminated in user code like so:

y = ... # preallocate
for x  vec
    y = MA.operate_to!!(y, f, x)
    s = MA.operate!!(+, s, y)
end

Furthermore, if f allocates more than just its return value, it'd make sense to implement buffer_for and buffered_operate_to!, which would be used like this to move allocations from f to outside the loop:

y = ... # preallocate
buf = MA.buffer_for(f, eltype(vec))
for x  vec
    y = MA.buffered_operate_to!!(buf, y, f, x)
    s = MA.operate!!(+, s, y)
end

Other MA functions to implement would be mutability and mutable_copy.

The purpose of using the !! functions instead of ! functions is making the code more generic: the !! functions are correct even when the MA API is not implemented.

@mattsignorelli
Copy link
Contributor

Looking at the @FastGTPSA docs you linked (thanks!), it seems like it relies on a global TPS vector for avoiding allocations? I guess that makes thread-safety impossible or at least expensive, which seems like a pretty big drawback?

You are correct, with this approach internally, the macro in its current state is definitely not thread safe. The current plan is to switch to a thread-safe memory pool of TPSs which the temporaries can draw from. This would make the macro thread safe, but is not implemented yet.

Another thing that seems like a drawback of the @FastGTPSA approach is that, as far as I understand, it can only work for hardcoded TPS types, because a global TPS vector is required for each TPS type supported by @FastGTPSA, right?

Kind of. Right now there are only 2 types of TPSs (TPS and ComplexTPS, both representing Float64s), both of which are supported by @FastGTPSA and have corresponding pre-allocated buffers. The macro basically changes the expression to use operators/functions that do special things when encountering a TPS or ComplexTPS, or default to the regular operator/function if any other type is encountered.

I do feel that in the long term, MutableArithmetics would be a good interface to implement here. However I would really like to see some kind of equivalent @FastGTPSA in MA that draws from a pre-allocated, thread-safe buffer of the mutable structs for expression evaluation. Take a look at some of the expressions in benchmark/track.jl. By simply prepending the macro to the expressions, the code stays very simple to read, is type-generic, and still gains all of the benefits of reusing temporaries. I fear rewriting these complicated expressions using operate!! and operate_to!! would make the code much less readable, and thus harder to maintain/debug.

@nsajko
Copy link
Author

nsajko commented Jun 9, 2024

A thread-safe memory pool will have its costs, but it seems like a reasonable solution. Implementing the MA interfaces would enable users to squeeze out every last ounce of performance, when necessary, using (hopefully) familiar APIs like MA.operate!!.

I guess it's true that @rewrite would be more useful if it supported more than just basic arithmetic, however I probably won't be tackling that issue as I'm not good with macros, and in fact I try to stay away from them.

@mattsignorelli
Copy link
Contributor

For now, I think I will wait until MA has better documentation and an equivalent @FastGTPSA/better @rewrite to implement the interface. Many months ago when I was making this decision, I read the README for MA and the paper it refers to there and still had a lot trouble understanding how to properly implement the interface. For this reason I fear MA.operate!! is not very familiar to outside users. I also feel having to rewrite simple expressions using operate!! would only make the code less-readable, and over-complicate things that are presently simple using @FastGTPSA (especially once the thread-safe pool is implemented).

I also am not the best with writing macros, @FastGTPSA was a major pain...

@mattsignorelli
Copy link
Contributor

@nsajko can you point me to any documentation, or describe how to actually implement MutableArithmetics with steps? In fact if you have the time and are familiar with it, a PR to MutableArithmetics updating the documentation with such a guide with be valuable

@nsajko
Copy link
Author

nsajko commented Jul 16, 2024

Does this example MA implementation help: kalmarek/Arblib.jl#178?

@mattsignorelli
Copy link
Contributor

Does this example MA implementation help: kalmarek/Arblib.jl#178?

Thanks!

I thought about this a lot, but decided it seems best to stick with and improve my current implementation:

I have made @FastGTPSA thread safe along with some significant improvements (including no more weird symbols exported, and it can act on a block of code). E.g.

julia> d = Descriptor(3, 7); x = vars(d);

julia> y = rand(3);

julia> @btime @FastGTPSA begin   # 2 allocations per TPS, no temporaries generated:
               t1 = $x[1]^3*sin($x[2])/log(2+$x[3])-exp($x[1]*$x[2])*im;
               t2 = $x[1]^3*sin($x[2])/log(2+$x[3])-exp($x[1]*$x[2])*im;
               z  = $y[1]^3*sin($y[2])/log(2+$y[3])-exp($y[1]*$y[2])*im;
              end;
  11.251 μs (4 allocations: 3.88 KiB)

julia> t3 = ComplexTPS64(); t4 = ComplexTPS64(); @gensym w; # pre-allocated TPSs and some scalar number

julia> @btime @FastGTPSA! begin # zero allocations
               $t3 = $x[1]^3*sin($x[2])/log(2+$x[3])-exp($x[1]*$x[2])*im;
               $t4 = $x[1]^3*sin($x[2])/log(2+$x[3])-exp($x[1]*$x[2])*im;
               $w  = $y[1]^3*sin($y[2])/log(2+$y[3])-exp($y[1]*$y[2])*im;
              end;
  10.930 μs (0 allocations: 0 bytes)

This basically does everything I could do with MA (and more actually, since @FastGTPSA works for any general function and not just the arithmetic operators), so at this point in time I don't see much benefit to satisfying the interface anymore.

On that note, I think my solution for @FastGTPSA might be of interest to others, since it supports all functions/operators, and by taking advantage of MD, it really is transparent to all non-TPS types while keeping the syntax completely simple

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants