-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential speedups for HTML::Entities::encode_entities #30
Comments
How do you envision the cache? An in-memory LRU or is this a hash that can grow quite large? Have you tried seeing what kind of speed gains can be had via |
Just a hash that could theoretically but not likely grow quite large, as I showed in the code above. I didn't bother with |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I think we could get some speedups in
encode_entities
by caching some common operations. The examples below would most benefit users encoding lots of data that is heavy on non-named, numeric entities. We may have similar benefits elsewhere.The first is to cache the results of the
sprintf
innum_entity
. This could be done with no effect on behavior, in exchange for some hash entries. Here are my experiments.gives these results:
Bottom line: Hash lookup is faster than the
sprintf
, so let's cache it.The other tweak would be to cache the call to
num_entity
inside the main regex inencode_entities
. Swap this:for this
This would have the side effect of modifying the
%char2entity
hash, which is visible to the outside world. If that wasn't OK, we could have a private copy of the hash specifically so it would be modifiable. The potential downside (or upside?) of that would be that if someone outside the module modified%char2entity
, it would have no effect onencode_entities
.For benchmarking
encode_entities
, I used this:Results:
42,281/s for the original unmodified
encode_entities
.52,746/s if the
encode_entities
used the cachingnum_entity
first mentioned, but the main regex is unchanged.64,769/s if the main conversion regex caches the results of calls to
num_entity
in%char2entity
. Changing this to call the cachingnum_entity
gave no noticeable improvement.I hope these give some ideas.
encode_entities
is an absolute workhorse at my job (we generate everything with Template Toolkit), and I'm sure for many many others. Any speedup would have wide-ranging benefits.The text was updated successfully, but these errors were encountered: