Adjustments for `Finch.jl#562` #57

mtsokol · 2024-05-20T15:32:47Z

Here's a PR with adjustments required by finch-tensor/Finch.jl#562.

One thing that I'm concerned about is reduced performance of code that uses PlusOneVector (zero-copy SciPy constructors), which is way slower than copy counterparts.

For example, in SDDMM, changing copy to no-copy CSR matrix introduces x10 speed reduction.

mtsokol · 2024-05-20T15:52:45Z

Overall I'm happy with the change, I would like only to investigate a bit further PlusOneVector.

To repeat one of my comments: I run SDDMM with csr sparse format. And for explicit copy I get 5s execution:

Executing:
:(function var"##compute#316"(prgm)
      begin
          V = ((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}
          V_2 = ((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
          V_3 = (((((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
          A0 = V::Tensor{DenseLevel{Int64, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}
          A2 = V_2::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
          A5 = V_3::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
          A8 = Tensor(Dense(SparseDict(Element{0.0, Float64}())))::Tensor{DenseLevel{Int64, SparseLevel{Int64, Finch.DictTable{Int64, Int64, Vector{Int64}, Vector{Int64}, Vector{Int64}, Dict{Tuple{Int64, Int64}, Int64}}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}
          @finch mode = :fast begin
                  A8 .= 0.0
                  for i49 = _
                      for i47 = _
                          for i48 = _
                              A8[i48, i47] << + >>= (*)((*)(A0[i48, i47], A2[i47, i49]), A5[i48, i49])
                          end
                      end
                  end
                  return A8
              end
          return (swizzle(A8, 2, 1),)
      end
  end)

And for the no-copy constructor (that differs only in PlusOneVector, used diffchecker.com) it's 100s:

Executing:
:(function var"##compute#308"(prgm)
      begin
          V = ((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, SparseListLevel{Int64, PlusOneVector{Int32}, PlusOneVector{Int32}, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
          V_2 = ((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
          V_3 = (((((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
          A0 = V::Tensor{DenseLevel{Int64, SparseListLevel{Int64, PlusOneVector{Int32}, PlusOneVector{Int32}, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
          A2 = V_2::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
          A5 = V_3::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
          A8 = Tensor(Dense(SparseDict(Element{0.0, Float64}())))::Tensor{DenseLevel{Int64, SparseLevel{Int64, Finch.DictTable{Int64, Int64, Vector{Int64}, Vector{Int64}, Vector{Int64}, Dict{Tuple{Int64, Int64}, Int64}}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}
          @finch mode = :fast begin
                  A8 .= 0.0
                  for i49 = _
                      for i47 = _
                          for i48 = _
                              A8[i48, i47] << + >>= (*)((*)(A0[i48, i47], A2[i47, i49]), A5[i48, i49])
                          end
                      end
                  end
                  return A8
              end
          return (swizzle(A8, 2, 1),)
      end
  end)

hameerabbasi · 2024-05-21T12:51:37Z

Ouch -- that's a big perf hit. I would wait to merge this as PlusOneVector is used a lot.

mtsokol · 2024-05-21T12:57:57Z

Ouch -- that's a big perf hit. I would wait to merge this as PlusOneVector is used a lot.

@hameerabbasi It turns out that PlusOneVector was already slow, but before it wasn't really used internally: finch-tensor/Finch.jl#562 (comment)

mtsokol · 2024-05-21T12:59:10Z

I created an issue for it: finch-tensor/Finch.jl#565

mtsokol · 2024-05-21T13:33:33Z

@hameerabbasi Would you have a spare moment to review it?

hameerabbasi · 2024-05-21T13:37:21Z

It's fairly minimal code -- there's not anything to review besides a dependency version bump. But if that dependency introduces a major regression, then we should wait for that regression to be fixed before bumping the dependency.

mtsokol · 2024-05-21T13:48:39Z

AFAIU it's not necessarily a regression, as PlusOneVectors used to be dropped internally. Latest release just exposed slow performance of PlusOneVector, so zero-copy SciPy constructors in general.

My current opinion is it's better to merge it and provide zero-copy/PlusOneVector with low performance (and keep working on improving it / or drop zero-copy SciPy constructors if it's not feasible) than being blocked on current version that drops PlusOneVector anyway.

hameerabbasi · 2024-05-21T13:56:18Z

A regression doesn't need to be a bug, it can be anything which practically renders the new version inferior to the old one. If such a situation arises, I'd re-evaluate after each release if the newest version is better for my use-case than whatever version I'm using now, and then upgrade.

mtsokol · 2024-05-21T14:57:28Z

Then I think I can open a PR which drops PlusOneVector from finch-tensor, which acknowledges that it's not used internally. Then this PR won't be a regression.

willow-ahrens · 2024-05-21T15:40:22Z

I think plusonevector introduces a type instability which we can fix pretty easily

willow-ahrens · 2024-05-21T15:41:27Z

But i think we should merge this change because it strictly makes the finch-tensor more robust. This is a bug fix, and we need to fix plusonevector in any case

willow-ahrens · 2024-05-21T15:43:58Z

Also, changes to the heuristic scheduler should be evaluated on mean improvement to all benchmarks. There will certainly be performance regressions on individual benchmarks as we develop it, but we need to make these changes to eventually reach better performance for the variety of kernels.

willow-ahrens · 2024-05-21T20:39:42Z

There's a one-line fix for the performance bug in finch-tensor/Finch.jl#567, with a 100x perf improvement

willow-ahrens · 2024-05-21T20:40:32Z

I bumped the version in there to 0.6.30

mtsokol self-assigned this May 20, 2024

Adjustments for Finch.jl#562

0604844

mtsokol force-pushed the wma-heuristic-fix branch from d5a2f95 to 0604844 Compare May 21, 2024 12:11

mtsokol requested a review from hameerabbasi May 21, 2024 12:47

willow-ahrens approved these changes May 21, 2024

View reviewed changes

mtsokol merged commit a87481a into main May 21, 2024
5 checks passed

mtsokol deleted the wma-heuristic-fix branch May 21, 2024 22:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjustments for `Finch.jl#562` #57

Adjustments for `Finch.jl#562` #57

mtsokol commented May 20, 2024

mtsokol commented May 20, 2024

hameerabbasi commented May 21, 2024

mtsokol commented May 21, 2024

mtsokol commented May 21, 2024

mtsokol commented May 21, 2024

hameerabbasi commented May 21, 2024

mtsokol commented May 21, 2024

hameerabbasi commented May 21, 2024

mtsokol commented May 21, 2024

willow-ahrens commented May 21, 2024

willow-ahrens commented May 21, 2024

willow-ahrens commented May 21, 2024

willow-ahrens commented May 21, 2024

willow-ahrens commented May 21, 2024

Adjustments for Finch.jl#562 #57

Adjustments for Finch.jl#562 #57

Conversation

mtsokol commented May 20, 2024

mtsokol commented May 20, 2024

hameerabbasi commented May 21, 2024

mtsokol commented May 21, 2024

mtsokol commented May 21, 2024

mtsokol commented May 21, 2024

hameerabbasi commented May 21, 2024

mtsokol commented May 21, 2024

hameerabbasi commented May 21, 2024

mtsokol commented May 21, 2024

willow-ahrens commented May 21, 2024

willow-ahrens commented May 21, 2024

willow-ahrens commented May 21, 2024

willow-ahrens commented May 21, 2024

willow-ahrens commented May 21, 2024

Adjustments for `Finch.jl#562` #57

Adjustments for `Finch.jl#562` #57