Skip to content

Commit

Permalink
Merge pull request #2204 from sopel-irc/modernize-web.decode
Browse files Browse the repository at this point in the history
web: use html.unescape() when available (Python 3.4+)
  • Loading branch information
dgw committed Nov 4, 2021
2 parents c0c6f86 + a321946 commit b61f630
Showing 1 changed file with 15 additions and 0 deletions.
15 changes: 15 additions & 0 deletions sopel/tools/web.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,11 @@
unichr = chr
unicode = str

try:
import html as HTMLLib
except ImportError:
HTMLLib = None

__all__ = [
'USER_AGENT',
'DEFAULT_HEADERS',
Expand Down Expand Up @@ -121,6 +126,16 @@ def decode(html):
:param str html: the HTML page or snippet to process
:return str: ``html`` with all entity references replaced
"""
if HTMLLib is not None:
# Python's stdlib has our back in 3.4+
# TODO: This should be the only implementation in Sopel 8
try:
return HTMLLib.unescape(html)
except AttributeError:
# Must be py3.3; fall through to our half-assed version.
pass

# Not on py3.4+ yet? Then you get to deal with the jankiness.
return r_entity.sub(entity, html)


Expand Down

0 comments on commit b61f630

Please sign in to comment.