Skip to content

Commit

Permalink
Use named list for regex substitution of namespace prefixes
Browse files Browse the repository at this point in the history
I profiled se lint because it seemed to be surprisingly slow for a
fairly small repository. This revealed that half of the execution time
was spent in the _replace_shorthand_namespaces method.

For the same repository, changing that method to use a named list (and
therefore, only a single call to regex.sub) reduced the time spent in
that method from ~15 seconds to ~5 seconds. This optimization is
correct, and arguably easier to understand, because a single attribute
can only have one prefix.

Using a named list also avoids the need to manually escape the prefixes
before including them in the regex pattern, which this code was
incorrectly skipping.

Finally, this change corrects a mistake in that method's documentation,
as the : suffix is not retained in the output.
  • Loading branch information
apasel422 authored and acabal committed Jun 17, 2024
1 parent ed5ea7b commit 6ee7e3a
Showing 1 changed file with 3 additions and 6 deletions.
9 changes: 3 additions & 6 deletions se/easy_xml.py
Original file line number Diff line number Diff line change
Expand Up @@ -231,16 +231,13 @@ def _replace_shorthand_namespaces(self, value:str) -> str:
shorthand namespaces.
Example:
epub:type -> {http://www.idpf.org/2007/ops}:type
epub:type -> {http://www.idpf.org/2007/ops}type
"""

output = value

if self.namespaces:
for name, identifier in self.namespaces.items():
output = regex.sub(fr"^{name}:", f"{{{identifier}}}", output)
value = regex.sub(r"^(\L<ns>):", lambda m: f"{{{self.namespaces[m[1]]}}}", value, ns=self.namespaces.keys())

return output
return value

def to_tag_string(self) -> str:
"""
Expand Down

0 comments on commit 6ee7e3a

Please sign in to comment.