Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factorization clarification question #75

Closed
marc-vdm opened this issue Jun 20, 2024 · 10 comments
Closed

Factorization clarification question #75

marc-vdm opened this issue Jun 20, 2024 · 10 comments

Comments

@marc-vdm
Copy link

marc-vdm commented Jun 20, 2024

I'm writing some custom BW code (can share over email, but is still to remain private for now) where I observe a ~60% speed increase (2600 200 solve/sec vs 1600 123 solve/sec) when I factorize beforehand.

Are you certain the below is correct?

!!! Use spsolve directly whenever possible !!! Contrary to the scipy implementation there is no performance
gain in PyPardiso by using factorized instead of spsolve.

You may be correct that the computational/memory cost is not worth it if you only perform a few -whatever number few is- Ax=b solves, on the same A, but when I do many solves (>20k), this does seem to matter quite a lot.


edit: Also note that def factorize() itself has conflicting information with the above.

def factorize(self, A):
"""
Factorize the matrix A, the factorization will automatically be used if the same matrix A is passed to the
solve method. This will drastically increase the speed of solve, if solve is called more than once for the
same matrix A
--- Parameters ---
A: sparse square CSR matrix (scipy.sparse.csr.csr_matrix), CSC matrix also possible

@cmutel
Copy link

cmutel commented Jun 20, 2024

I am 99% sure that the difference your are seeing is due to

solver._check_A(A)
. If the matrix wasn't factorized the difference would be at least an order of magnitude slower. What this line does is make sure the A matrix, which is factorized, is the same now as when it was factorized in the first place. factorize doesn't make this check and is therefore somewhat faster.

@cmutel
Copy link

cmutel commented Jun 20, 2024

The ideal option, in my opinion, would be to be able to call spsolve without the A matrix check, probably through an additional input argument. This would still preserve compatibility with scipy input arguments, and avoid paying the memory cost of keeping another copy of the A matrix around:

Don't use the factorized method for very large matrices, because it needs to keep an additional copy of
A in memory.
!!! Use spsolve directly whenever possible !!! Contrary to the scipy implementation there is no performance
gain in PyPardiso by using factorized instead of spsolve.
"""
solve_b = functools.partial(spsolve, A.tocsr().copy(), squeeze=False, solver=solver)

@haasad
Copy link
Owner

haasad commented Jun 20, 2024

You should be able to do the following for maximum speed:

import pypardiso

solver = pypardiso.ps
solver.factorize(A)
solver.set_phase(33)
x = solver._call_pardiso(A, b)

This skips all the checks etc.

@marc-vdm
Copy link
Author

Thanks for your replies both!
I'll experiment with Adrians solution by rewriting def solve_linear_system() and def decompose_technosphere() and see how that (doesn't?) change results

I'll report back with some findings

@haasad
Copy link
Owner

haasad commented Jun 20, 2024

Please be aware that factorized is in a way just a convenience function, so pypardiso can be used as drop-in replacement for scipy.sparse.linalg.factorized. It still uses pypardiso.spsolve under the hood and pypardiso.spsolve always does a factorization for re-use by subsequent calls to spsolve.

@haasad
Copy link
Owner

haasad commented Jun 20, 2024

I think I found the reason for the the performance difference that you see. I assume the brightway technosphere matrix is in csc format?

factorized converts A to csr format and keeps a copy of it:

solve_b = functools.partial(spsolve, A.tocsr().copy(), squeeze=False, solver=solver)

spsolve converts to csr on every call:

if sp.issparse(A) and A.format == "csc":
A = A.tocsr() # fixes issue with brightway2 technosphere matrix

Unfortunately my commit message from 2016 is pretty useless: 7a60c84

pardiso can deal with both csc and csr formats:

if sp.issparse(A) and A.format == "csr":
self._solve_transposed = False
self.set_iparm(12, 0)
elif sp.issparse(A) and A.format == "csc":
self._solve_transposed = True
self.set_iparm(12, 1)
else:
msg = 'PyPardiso requires matrix A to be in CSR or CSC format, but matrix A is: {}'.format(type(A))
raise TypeError(msg)

@haasad
Copy link
Owner

haasad commented Jun 20, 2024

Looks like I actually wrote down the reasoning for this in #7, even with a jupyter notebook and everything 😊

@marc-vdm
Copy link
Author

Alright, here's some findings:

@cmutel
I am 99% sure that the difference your are seeing is due to

solver._check_A(A)

I'm not entirely sure this is correct. I think it may be the conversion to csr that is costing so much time.

if sp.issparse(A) and A.format == "csc":
A = A.tocsr() # fixes issue with brightway2 technosphere matrix

When I convert my BW matrices to csr beforehand, I get the same speed as calling lca.decompose_technosphere() (which does this for us). solver._check_A does, however, re-do the checks for csc/csr, which doesn't seem needed when calling spsolve as we already convert to csr anyway.

I think if Brightway would set it's default format to csr instead of csc, we can speed up these calculations by default. The current BW25 implementation (which blocks the factorization in lca.decompose_technosphere()) seems to be incurring a speed penalty for no reason unless user thinks to convert lca.technosphere_matrix to csr themselves. This confirms what Adrian is saying in this comment. I understand the original csc format for UMFPACK compatibility, but most users are on x86 and thus use pypardiso, so perhaps this default is not the best.

@haasad
You should be able to do the following for maximum speed:

import pypardiso

solver = pypardiso.ps
solver.factorize(A)
solver.set_phase(33)
x = solver._call_pardiso(A, b)

This adds about ~10% speed for me, while nice if you know what you're doing, perhaps it's safe that this is indeed not done for BW.

Now to look ahead:
What makes sense for BW?
I would like to see lca.decompose_technosphere() not be blocked when using pypardiso -especially with a wrong warning type- but perhaps I'm misunderstanding something here?
Furthermore, does it make sense to set the sparse format based on the solver we're checking the solver anyway, so setting the 'correct' sparse format could help speed things up.

Let me know if you'd like to see a PR for BW25 Chris, I'll try and so it soon-ish then.

@haasad
Copy link
Owner

haasad commented Jun 24, 2024

I would like to see lca.decompose_technosphere() not be blocked when using pypardiso -especially with a wrong warning type- but perhaps I'm misunderstanding something here?

  • pypardiso always does a factorization + solve, the warning in bw2calc is correct
  • the benefits that you see don't come from factorization, but are a side-effect of not having to do the repeated sparse format conversion, i.e. A = A.tocsr()
  • pypardiso needs the brightway technosphere matrix to be in csr, that's why the conversion to csr is enforced in pypardiso.spsolve

Furthermore, does it make sense to set the sparse format based on the solver we're checking the solver anyway, so setting the 'correct' sparse format could help speed things up.

I also think that the best way would be if brightway uses csr format as default in combination with pypardiso.

This adds about ~10% speed for me, while nice if you know what you're doing, perhaps it's safe that this is indeed not done for BW.

I don't think the trade-off of 10% speed versus skipping all safety checks is worth it. If you do an LCA calculation, modify some values in the technosphere matrix and then do another LCA calculation, you might get completely wrong results without the checks in pypardiso.

@marc-vdm
Copy link
Author

pypardiso always does a factorization + solve, the warning in bw2calc is correct

Well, partially IMO. By giving this warning and putting the factorization in an else:, BW users with pypardiso (which is still most of them) will keep using a csc format technosphere, as previously lca.decompose_technosphere() would convert this to csr. In BW25, a user would now need to manually convert their technosphere, or face a ~60% speed loss as the technosphere needs to be converted to csr for every new FU.

We can either automatically keep technosphere (and for consistency also biosphere?) in the correct format for the solver (csc for scipy, csr for pypardiso) -which is the best solution IMO-, or, we allow lca.decompose_technosphere() to call factorized(), which would convert the technosphere to the correct csr format.

I'll close this discussion now, I think we can continue in brightway-lca/brightway2-calc#98 as this is a BW issue, not a pypardiso issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants