Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I've been on a Python performance optimization kick recently (see
pylint-dev/astroid#497), and I'm a
pycodestyle
user, so I figured I would give it a look and see if its performance
could be improved at all.
Of course,
pycodestyle
is already pretty fast, so to give it somestress I'm testing it out by pointing it right at a large repo, namely
Zulip (https://github.com/zulip/zulip). In particular, my test command
is
time ~/pycodestyle/pycodestyle.py -qq ~/zulip
.Here are the times from three runs of master:
I used the
yappi
profiling library to see if there were any hotspotsin the code. There were. Take a look at the graph below. The brighter
the box, the hotter the spot. In more detail, each box represents a
function and has three numbers: 1) the percentage of total CPU time
spent in that function, 2) the percentage of total CPU time spent in
that function but not its subcalls, and 3) the number of times the
function was called.
The red box that sticks out is
Checker.run_check
. It is called wellover two million times, and 27.7 of total CPU time is spent there,
almost all over which is in the function itself. This seems like an
awful lot considering how short the function is:
So why does it suck up so much time?
I think I've worked out how it goes. When a check is registered (with
register_check
), its arguments are extracted with theinspect
library and stored as a list of strings. When a check is run,
run_check
iterates over its associated list of arguments,dynamically accesses those attributes of the
Checker
, and thenpasses those values to the check to actually run.
The problem here is that dynamic attribute access is slow, and doing
it in tight loops is really slow (see
pylint-dev/astroid#497 for a harrowing cautionary
tale on this subject). My idea was to see if there was a way to do
away with the dynamic attribute access, basically by "compiling" the
attribute access into the code.
It turns out that this can be accomplished by passing the checker
instance into the check as an argument, and then call the attributes
directly on the checker. Implementing this change involves a
large-scale shuffling of arguments and strings, but other than that
not much changes.
register_check
has to take the check's argumentnames as arguments now, since they are no longer the actual arguments.
run_check
itself can also be done away with, since all it would haveto do would be to call the check with the checker as an argument, and
that can be done inline.
This change resulted in a substantial speedup:
Here is the resulting
yappi
graph:This graph is a lot more colorful than the last one. This means that
the work is spread out more evenly among the various functions and
there isn't one overwhelmingly critical hotspot.
One function that stuck out to me was
Checker.init_checker_state
.After some experimentation, it appeared that despite taking up almost
6% of total CPU time, the function didn't do much. Cutting it provided
a non-negligible speed improvement:
A little further poking around revealed that
run_check
andinit_checker_state
were the only consumers of the "argument names",so I cut those out too. This led to some nice code simplification and
an ever-so-slight speedup:
Here is the
yappi
graph after these changes:The major hotspot is now
tokenize.tokenize
, which is part of thestandard library. This is good, as it suggests that
pycodestyle
isnearing the point of being as fast as it can be. After that, the next
most expensive functions are
check_logical
,generate_tokens
,build_tokens_line
,check_all
,maybe_check_physical
, and_is_eol_token_
.These functions all feel to me like they are doing something
inefficiently, but I don't understand them well enough to say what.
These measurements were all taken running master with