-
-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: parallelism within refurb? #279
Comments
Thank you for bringing this up! Refurb is built on top of Mypy, and as such, all caching and processing is done by Mypy. For reasons I have yet to find out, Refurb (and thus Mypy) is not reusing the cache after subsequent builds. Your question is about speeding up the initial/subsequent builds using multiprocessing, which is another speedup that will need looking into. This is a known issue and is long overdue for a fix, so I'll go ahead and open an issue on Mypy today to get a conversation started and then go from there! For my curiosity, how long does it take to run |
Thanks for responding, and yeah it would be good to support parallelism. Thanks for being willing to implement this. For timings running
|
Hi @jamesbraza , I just released a new version of Refurb to address some of the speed issues. How long does Refurb take with the new version (v1.20.0)? If Refurb is still running slow, take a look at the timing stats using the new Thanks! |
Thanks for doing a performance improvement! I was a little bored of making a massive unit test, so I went ahead and just did this right now. Running |
Thank you for this data! Here's some stats from what I can see:
Most of the runtime is spent in Mypy, loading, parsing and type checking all the files and dependencies. There are a few ways we can mitigate this:
Number 1 is better overall, but might be harder. Number 2 might be easier to do, but could lead to important type info being lost, reducing Refurb's ability to check certain types. I'll keep looking into this. Also, does it still take 10+ minutes to run Refurb on the whole repository? I don't need the timing stats for it, I'm just curious how much of an impact my speedup change made. |
Thank you for the breakdown! That helps me understand things. I am slowly figuring this out on my end too, now running
I came across https://mypyc.readthedocs.io/en/latest/performance_tips_and_tricks.html, and blocklisting modules/packages from Does |
Thank you for the response! It's unfortunate that Docker runs so slow on macOS, but it's good to know that it's Docker, not Refurb, that's taking 10 minutes. Note to self, I should probably ask what environment people are running Refurb in when it comes to speed/performance issues. Like I said before, blacklisting certain modules means that you won't be able to get type info from them. For certain checks this isn't an issue, but for checks that require type info, choosing which modules to include/exclude might be hard. This is something that I should look into nonetheless. And from what I can tell, Mypy does not support parallelism. I've taken a look at the module loading/parsing code, but there's a lot to take in, so I don't know how hard it would be to parallelize this process. The issue you mentioned is almost 8 years old, which probably means that this is either really hard/time consuming, or has been on the back burner for a while (or a mix of both). I've been using Mypy for years and it's always been a bit slow, so it would be super cool (and super fast) if it were to have support for parallelism! |
Does
refurb
have any support for parallelism (e.g. multiprocessing)?I am getting this tool adopted across the company I work at, and for larger repos it can take 10+ minutes.
The text was updated successfully, but these errors were encountered: