Skip to content

Commit

Permalink
feat(mediatypes): reimplement (and unvendor) mimeparse (#2348)
Browse files Browse the repository at this point in the history
* feat: WiP reimplement mimeparse

* feat(mediatypes): add some skeletons for mediatype parsing

* chore: fix up after master merge

* feat(mimeparse): wip doodles

* feat(mediatypes): implement computation of best quality

* feat(mediatypes): remove vendored mimeparse

* docs: add a newsfragment for one of the issues

* refactor: remove debug `print()`s

* feat(mediatypes): add specialized mediatype/range errors, coverage

* docs(newsfragments): add a newsfragment for #1367

* test(mediatypes): add more tests

* feat(mediatypes): improve docstring, simplify behaviour

* refactor(mediatypes): use a stricter type annotation

* chore: remove an unused import

* chore: fix docstring style violation D205

* chore(docs): apply review suggestion to `docs/ext/rfc.py`

Co-authored-by: Federico Caselli <cfederico87@gmail.com>

* docs(newsfragments): apply review suggestion for `docs/_newsfragments/864.breakingchange.rst`

Co-authored-by: Federico Caselli <cfederico87@gmail.com>

* refactor(mediatypes): address some review comments

* perf(mediatypes): short-circuit if q is absent as per review comment

* docs: explain how to mitigate a potentially breaking change

* docs: add a note that we continue to maintain python-mimeparse

* refactor(mediatypes): convert _MediaType and _MediaRange to dataclasses

* fix(mediatypes): only use dataclass(slots=True) where supported (>=py310)

* refactor(mediatypes): a yet another attempt to make dataclasses work with __slots__

---------

Co-authored-by: Federico Caselli <cfederico87@gmail.com>
  • Loading branch information
vytas7 and CaselIT authored Oct 5, 2024
1 parent d45c06c commit 91e90b5
Show file tree
Hide file tree
Showing 23 changed files with 560 additions and 264 deletions.
3 changes: 3 additions & 0 deletions docs/_newsfragments/1367.newandimproved.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
The new implementation of :ref:`media type utilities <mediatype_util>`
(Falcon was using the ``python-mimeparse`` library before) now always favors
the exact media type match, if one is available.
36 changes: 36 additions & 0 deletions docs/_newsfragments/864.breakingchange.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
Falcon is no longer vendoring the
`python-mimeparse <https://github.com/falconry/python-mimeparse>`__ library;
the relevant functionality has instead been reimplemented in the framework
itself, fixing a handful of long-standing bugs in the new implementation.

If you use standalone
`python-mimeparse <https://github.com/falconry/python-mimeparse>`__ in your
project, do not worry! We will continue to maintain it as a separate package
under the Falconry umbrella (we took over about 3 years ago).

The following new behaviors are considered breaking changes:

* Previously, the iterable passed to
:meth:`req.client_prefers <falcon.Request.client_prefers>` had to be sorted in
the order of increasing desirability.
:func:`~falcon.mediatypes.best_match`, and by proxy
:meth:`~falcon.Request.client_prefers`, now consider the provided media types
to be sorted in the (more intuitive, we hope) order of decreasing
desirability.

* Unlike ``python-mimeparse``, the new
:ref:`media type utilities <mediatype_util>` consider media types with
different values for the same parameters as non-matching.

One theoretically possible scenario where this change can affect you is only
installing a :ref:`media <media>` handler for a content type with parameters;
it then may not match media types with conflicting values (that used to match
before Falcon 4.0).
If this turns out to be the case, also
:ref:`install the same handler <custom_media_handlers>` for the generic
``type/subtype`` without parameters.

The new functions,
:func:`falcon.mediatypes.quality` and :func:`falcon.mediatypes.best_match`,
otherwise have the same signature as the corresponding methods from
``python-mimeparse``.
4 changes: 4 additions & 0 deletions docs/api/util.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,14 @@ HTTP Status
.. autofunction:: falcon.http_status_to_code
.. autofunction:: falcon.code_to_http_status

.. _mediatype_util:

Media types
-----------

.. autofunction:: falcon.parse_header
.. autofunction:: falcon.mediatypes.quality
.. autofunction:: falcon.mediatypes.best_match

Async
-----
Expand Down
6 changes: 3 additions & 3 deletions docs/ext/rfc.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@

import re

IETF_DOCS = 'https://datatracker.ietf.org/doc/html'
RFC_PATTERN = re.compile(r'RFC (\d{4}), Section ([\d\.]+)')


Expand All @@ -39,11 +40,10 @@ def _process_line(line):
section = m.group(2)

template = (
'`RFC {rfc}, Section {section} '
'<https://tools.ietf.org/html/rfc{rfc}#section-{section}>`_'
'`RFC {rfc}, Section {section} <{ietf_docs}/rfc{rfc}#section-{section}>`__'
)

rendered_text = template.format(rfc=rfc, section=section)
rendered_text = template.format(rfc=rfc, section=section, ietf_docs=IETF_DOCS)

return line[: m.start()] + rendered_text + line[m.end() :]

Expand Down
6 changes: 6 additions & 0 deletions falcon/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@
'http_status_to_code',
'IS_64_BITS',
'is_python_func',
'mediatypes',
'misc',
'parse_header',
'reader',
Expand Down Expand Up @@ -138,6 +139,8 @@
'HTTPUnsupportedMediaType',
'HTTPUriTooLong',
'HTTPVersionNotSupported',
'InvalidMediaRange',
'InvalidMediaType',
'MediaMalformedError',
'MediaNotFoundError',
'MediaValidationError',
Expand Down Expand Up @@ -388,6 +391,8 @@
from falcon.errors import HTTPUnsupportedMediaType
from falcon.errors import HTTPUriTooLong
from falcon.errors import HTTPVersionNotSupported
from falcon.errors import InvalidMediaRange
from falcon.errors import InvalidMediaType
from falcon.errors import MediaMalformedError
from falcon.errors import MediaNotFoundError
from falcon.errors import MediaValidationError
Expand Down Expand Up @@ -617,6 +622,7 @@
from falcon.util import http_status_to_code
from falcon.util import IS_64_BITS
from falcon.util import is_python_func
from falcon.util import mediatypes
from falcon.util import misc
from falcon.util import parse_header
from falcon.util import reader
Expand Down
12 changes: 7 additions & 5 deletions falcon/app_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -291,12 +291,14 @@ def default_serialize_error(req: Request, resp: Response, exception: HTTPError)
resp: Instance of ``falcon.Response``
exception: Instance of ``falcon.HTTPError``
"""
predefined = [MEDIA_XML, 'text/xml', MEDIA_JSON]

predefined = [MEDIA_JSON, 'text/xml', MEDIA_XML]
media_handlers = [mt for mt in resp.options.media_handlers if mt not in predefined]
# NOTE(caselit) add all the registered before the predefined ones. This ensures that
# in case of equal match the last one (json) is selected and that the q= is taken
# into consideration when selecting the media
preferred = req.client_prefers(media_handlers + predefined)
# NOTE(caselit,vytas): Add the registered handlers after the predefined
# ones. This ensures that in the case of an equal match, the first one
# (JSON) is selected and that the q parameter is taken into consideration
# when selecting the media handler.
preferred = req.client_prefers(predefined + media_handlers)

if preferred is None:
# NOTE(kgriffs): See if the client expects a custom media
Expand Down
2 changes: 1 addition & 1 deletion falcon/asgi/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -554,7 +554,7 @@ async def __call__( # type: ignore[override] # noqa: C901
data = resp._data

if data is None and resp._media is not None:
# NOTE(kgriffs): We use a special MISSING singleton since
# NOTE(kgriffs): We use a special _UNSET singleton since
# None is ambiguous (the media handler might return None).
if resp._media_rendered is _UNSET:
opt = resp.options
Expand Down
2 changes: 1 addition & 1 deletion falcon/asgi/request.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ def __init__(

self.uri_template = None
# PERF(vytas): Fall back to class variable(s) when unset.
# self._media = MISSING
# self._media = _UNSET
# self._media_error = None

# TODO(kgriffs): ASGI does not specify whether 'path' may be empty,
Expand Down
2 changes: 1 addition & 1 deletion falcon/asgi/response.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ async def render_body(self) -> Optional[bytes]: # type: ignore[override]
data = self._data

if data is None and self._media is not None:
# NOTE(kgriffs): We use a special MISSING singleton since
# NOTE(kgriffs): We use a special _UNSET singleton since
# None is ambiguous (the media handler might return None).
if self._media_rendered is _UNSET:
if not self.content_type:
Expand Down
10 changes: 10 additions & 0 deletions falcon/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,8 @@ def on_get(self, req, resp):
'HTTPUnsupportedMediaType',
'HTTPUriTooLong',
'HTTPVersionNotSupported',
'InvalidMediaRange',
'InvalidMediaType',
'MediaMalformedError',
'MediaNotFoundError',
'MediaValidationError',
Expand All @@ -111,6 +113,14 @@ class CompatibilityError(ValueError):
"""The given method, value, or type is not compatible."""


class InvalidMediaType(ValueError):
"""The provided media type cannot be parsed into type/subtype."""


class InvalidMediaRange(InvalidMediaType):
"""The media range contains an invalid media type and/or the q value."""


class UnsupportedScopeError(RuntimeError):
"""The ASGI scope type is not supported by Falcon."""

Expand Down
7 changes: 3 additions & 4 deletions falcon/http_error.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,7 @@
import xml.etree.ElementTree as et

from falcon.constants import MEDIA_JSON
from falcon.util import code_to_http_status
from falcon.util import http_status_to_code
from falcon.util import misc
from falcon.util import uri

if TYPE_CHECKING:
Expand Down Expand Up @@ -136,7 +135,7 @@ def __init__(
# we'll probably switch over to making everything code-based to more
# easily support HTTP/2. When that happens, should we continue to
# include the reason phrase in the title?
self.title = title or code_to_http_status(status)
self.title = title or misc.code_to_http_status(status)

self.description = description
self.headers = headers
Expand All @@ -161,7 +160,7 @@ def status_code(self) -> int:
"""HTTP status code normalized from the ``status`` argument passed
to the initializer.
""" # noqa: D205
return http_status_to_code(self.status)
return misc.http_status_to_code(self.status)

def to_dict(
self, obj_type: Type[MutableMapping[str, Union[str, int, None, Link]]] = dict
Expand Down
4 changes: 2 additions & 2 deletions falcon/media/handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@
from falcon.media.multipart import MultipartFormHandler
from falcon.media.multipart import MultipartParseOptions
from falcon.media.urlencoded import URLEncodedFormHandler
from falcon.util import mediatypes
from falcon.util import misc
from falcon.vendor import mimeparse


class MissingDependencyHandler(BinaryBaseHandlerWS):
Expand Down Expand Up @@ -186,7 +186,7 @@ def _best_match(media_type: str, all_media_types: Sequence[str]) -> Optional[str
try:
# NOTE(jmvrbanac): Mimeparse will return an empty string if it can
# parse the media type, but cannot find a suitable type.
result = mimeparse.best_match(all_media_types, media_type)
result = mediatypes.best_match(all_media_types, media_type)
except ValueError:
pass

Expand Down
6 changes: 3 additions & 3 deletions falcon/request.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,10 @@
from falcon.typing import ReadableIO
from falcon.util import deprecation
from falcon.util import ETag
from falcon.util import mediatypes
from falcon.util import structures
from falcon.util.uri import parse_host
from falcon.util.uri import parse_query_string
from falcon.vendor import mimeparse

DEFAULT_ERROR_LOG_FORMAT = '{0:%Y-%m-%d %H:%M:%S} [FALCON] [ERROR] {1} {2}{3} => '

Expand Down Expand Up @@ -1167,7 +1167,7 @@ def client_accepts(self, media_type: str) -> bool:

# Fall back to full-blown parsing
try:
return mimeparse.quality(media_type, accept) != 0.0
return mediatypes.quality(media_type, accept) != 0.0
except ValueError:
return False

Expand All @@ -1187,7 +1187,7 @@ def client_prefers(self, media_types: Iterable[str]) -> Optional[str]:

try:
# NOTE(kgriffs): best_match will return '' if no match is found
preferred_type = mimeparse.best_match(media_types, self.accept)
preferred_type = mediatypes.best_match(media_types, self.accept)
except ValueError:
# Value for the accept header was not formatted correctly
preferred_type = ''
Expand Down
2 changes: 1 addition & 1 deletion falcon/response.py
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@ def render_body(self) -> Optional[bytes]:
data = self._data

if data is None and self._media is not None:
# NOTE(kgriffs): We use a special MISSING singleton since
# NOTE(kgriffs): We use a special _UNSET singleton since
# None is ambiguous (the media handler might return None).
if self._media_rendered is _UNSET:
if not self.content_type:
Expand Down
Loading

0 comments on commit 91e90b5

Please sign in to comment.