Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nemo Galois fields decrease performance compared to AbstractAlgebra #1693

Open
Ktrompfl opened this issue Feb 29, 2024 · 11 comments
Open

Nemo Galois fields decrease performance compared to AbstractAlgebra #1693

Ktrompfl opened this issue Feb 29, 2024 · 11 comments

Comments

@Ktrompfl
Copy link

Using Galois fields from Nemo instead of AbstractAlgebra heavily impacts the performance of basic arithmetic operations due to allocations for every single operation:

julia> using AbstractAlgebra, BenchmarkTools, Nemo

Welcome to Nemo version 0.43.1

Nemo comes with absolutely no warranty whatsoever

julia> @btime begin
           K = AbstractAlgebra.GF(2)
           a = one(K)

           for _ = 1:1_000_000
               a *= a
           end
       end

  247.656 μs (2 allocations: 40 bytes)

julia> @btime begin
           K = Nemo.GF(2)
           a = one(K)

           for _ = 1:1_000_000
               a *= a
           end
       end


  66.673 ms (1000006 allocations: 76.29 MiB)

julia> @btime begin
           K = AbstractAlgebra.GF(2)
           a = one(K)

           for _ = 1:1_000_000
               a += a
           end
       end
  78.806 ns (2 allocations: 40 bytes)

julia> @btime begin
           K = Nemo.GF(2)
           a = one(K)

           for _ = 1:1_000_000
               a += a
           end
       end

  68.983 ms (1000006 allocations: 76.29 MiB)

I assume this is a byproduct of calling native libraries, but still very unfortunate.

@thofma
Copy link
Member

thofma commented Feb 29, 2024

Yes and no. Try using Native.GF(2) (once Nemo is loaded).

@Ktrompfl
Copy link
Author

Thanks for your response. Apart from being slightly faster on addition and not causing any allocations at all, Native.GF still appears to be a magnitude slower on multiplication when compared to AbstractAlgebra.GF:

julia> @btime begin
           K = Native.GF(2)
           a = one(K)

           for _ = 1:1_000_000
               a *= a
           end
       end

  5.737 ms (0 allocations: 0 bytes)

julia> @btime begin
           K = Native.GF(2)
           a = one(K)

           for _ = 1:1_000_000
               a += a
           end
       end

  44.278 ns (0 allocations: 0 bytes)

@thofma
Copy link
Member

thofma commented Feb 29, 2024

Hm, yes, it is quite bad, but a bit puzzling. @fieker: The AbstractAlgebra code is not that sophisticated in comparison to Nemo/flint, which goes through n_mulmod_preinv. Is this just the overhead of the function call or is there something else going on?

@fieker
Copy link
Contributor

fieker commented Feb 29, 2024

my guess is 2 things:

  • preinv (but that is minor)
  • memory management: GF(2) in AA will use machine stuff, Int64 for operations. This is immutable and stack allocated and such. In contrast, Native.GF(2) is expensive

My guess is that Native.GF(2) will outperform AA on tasks taking place in C (in flint), however "naive" native julia code will favour AA by a wide margin.
Furthermore, each "+" in Flint is a function call, while AA is handled directly by the compiler....
I guess we can/ should implement the Native.GF(p) arithmetic directly ourselves

@fieker
Copy link
Contributor

fieker commented Feb 29, 2024

I withdraw my comments. fpFieldElem is usng preinv and is immutable julia. * is calling into C however, so the call mihght be
much more expensive (at least it should!!) thay checking if a*b is even or odd. Has anyone done the timings on 30 bit or 60 bit primes as well? The Native code is not at all optimized for small p

@thofma
Copy link
Member

thofma commented Feb 29, 2024

I tried 30 bits and got the same timings. We could in fact just copy the code from AA to Nemo.

@thofma
Copy link
Member

thofma commented Mar 29, 2024

@albinahlback since you interested in flint being fast, do you have an idea what could be going on? How long does

julia> @btime begin
           K = Native.GF(1099511627791)
           a = one(K)

           for _ = 1:1_000_000
               a *= a
           end
       end

  5.737 ms (0 allocations: 0 bytes)

take for you and how long does that take for you with an equivalent C program using nmod? It is clear that the ccall on the julia side is not for free, but I wonder how expensive it is.

@albinahlback
Copy link
Contributor

Do you know if AbstractAlgebra-stuff is being inlined or not?

@albinahlback
Copy link
Contributor

If someone would be able to give me the generated assembly for each case, it would be pretty easy to see why the difference is so big.

@thofma
Copy link
Member

thofma commented Mar 29, 2024

Hm, I tried to produce the assembly code, but I noticed something strange. It seems that there is something dodgy going one due to some problems in the benchmark setup.

@Ktrompfl can you reproduce the following?

function f_with_return(K)
   a = one(K)
  for _ = 1:1_000_000
    a *= a
  end
  return a
end

function f_without_return(K)
   a = one(K)
  for _ = 1:1_000_000
    a *= a
  end
end

K_AA = AbstractAlgebra.GF(1099511627791)
K_Nemo = Nemo.Native.GF(1099511627791)

I then get

julia> t_AA = @belapsed(f_with_return(K_AA)), @belapsed(f_without_return(K_AA))
(0.005825292, 0.000312042)

julia> t_Nemo = @belapsed(f_with_return(K_Nemo)), @belapsed(f_without_return(K_Nemo))
(0.012017084, 0.012015667)

So clearly the compiler is doing something "clever" in the AA case, since the function always returns nothing.

Looks like it is 2x slower, which is not "too bad".

@Ktrompfl
Copy link
Author

With the setup above I get

julia> t_AA = @belapsed(f_with_return(K_AA)), @belapsed(f_without_return(K_AA))
(0.003119424, 0.000221444)

julia> t_Nemo = @belapsed(f_with_return(K_Nemo)), @belapsed(f_without_return(K_Nemo))
(0.005218629, 0.005237672)

which should be fine, whereas with K_Nemo2 = Nemo.GF(1099511627791) I get

julia> t_Nemo2 = @belapsed(f_with_return(K_Nemo2)), @belapsed(f_without_return(K_Nemo2))
(0.05128647, 0.051381936)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants