From f1e715eb7bf703cc820ff4db58dc0837cce635a5 Mon Sep 17 00:00:00 2001 From: dieter Date: Wed, 12 Jun 2019 10:40:57 +0200 Subject: [PATCH 1/8] Modernize request parameter handling #641 Tests broken due to backward incompatible changes not yet fixed --- CHANGES.rst | 68 ++ docs/zdgbook/ObjectPublishing.rst | 756 +++++++++++++------ src/ZPublisher/BaseRequest.py | 14 +- src/ZPublisher/Converters.py | 338 +++++---- src/ZPublisher/HTTPRequest.py | 774 +++----------------- src/ZPublisher/interfaces.py | 10 + src/ZPublisher/request_params.py | 724 ++++++++++++++++++ src/ZPublisher/tests/testHTTPRequest.py | 88 ++- src/ZPublisher/tests/test_Converters.py | 40 +- src/ZPublisher/tests/test_request_params.py | 582 +++++++++++++++ 10 files changed, 2285 insertions(+), 1109 deletions(-) create mode 100644 src/ZPublisher/request_params.py create mode 100644 src/ZPublisher/tests/test_request_params.py diff --git a/CHANGES.rst b/CHANGES.rst index 51703a4d21..d35a64aae8 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -8,6 +8,74 @@ https://zope.readthedocs.io/en/2.13/CHANGES.html For the change log of the alpha versions see https://github.com/zopefoundation/Zope/blob/4.0a6/CHANGES.rst +4.1 (unreleased) +---------------- + +Features +++++++++ + +- Modernized request paramter handling + (`#641 `_): + + - fully recursive aggregation which can handle structures of arbitrary depth + + - simplified processing model + + - support for special HTML5 features: ``_charset_`` informs + about the used form encoding; character references used + to work around encoding limitations + + - treats parameter values internally as text + (this means unicode for Python 2). + For Python 2, the conversion to unicode is skipped if it + results in a ``UnicodeDecodeError``. The value is then used + as is. + + For Python 2, + the final values for parameters without converter and encoding directive + are encoded with Zope's default encoding; character references + are used for characters which cannot be encoded. + + - Errors encountered during request parameter processing are + not reported immediately (at this stage, application specific + error handling has not yet been set up). Instead, a + ``post_traverse`` is registered which will raise + a ``RequestParameterError`` exception after the traversal + in the proper application context. + The ``RequestParameterError`` describes all errors + encountered during request parameter processing. + + - ``FileUpload`` has new attributes ``type`` (the associated + MIME type or ``None``) and the ``dict`` ``type_options`` + (containing the provided MIME type parameters). + + Backward incompatibilities: + + - a parameter must now follow a corresponding default parameter + to override the default paramter; formerly the relative order + of parameter and default parameter was of no importance. + + There is a new directive "conditional" which can also be used + to define a default value. A conditional parameter is ignored, + if there is already a corresponding parameter, and + otherwise acts like a default parameter. Thus, its behaviour + is comparable to the former bahaviour of "default". + + - aggregators are now applied from left to right; especially, + their relative order is important. + Formerly, aggregators were applied in a fixed order -- + independent of the order in which they were specified. + + - the converter *functions* (in ``ZPublisher.Converters``) + no longer support the conversion of files (because they + do not know the encoding applicable for the file). + The converter *directives*, however, can still be applied + to (uploaded) files. They use the encoding explicitly + specified via an encoding directive or fall back to + Zope's default encoding. + + + 4.0.1 (unreleased) ------------------ diff --git a/docs/zdgbook/ObjectPublishing.rst b/docs/zdgbook/ObjectPublishing.rst index 6d141ea4bd..acbb0b6eb6 100644 --- a/docs/zdgbook/ObjectPublishing.rst +++ b/docs/zdgbook/ObjectPublishing.rst @@ -631,8 +631,7 @@ and an optional request body. A URL consists of various parts, among them a *path* and a *query*, see `RFC 2396 `_ for details. -Zope uses the *path* to locate an object, method or view for -producing the response (this process is called *traversal*) +Zope uses the *path* to locate the published object and *query* - if present - as a specification for request parameters. Additionally, request parameters can come from the optional request body. @@ -641,13 +640,79 @@ Zope preprocesses the incoming request information and makes the result available in the so called *request* object. This way, the response generation code can access all relevant request information in an easy and natural (pythonic) way. -Preprocessing transforms the request *parameters* into request (or form) -*variables*. +Preprocessing transforms the *request parameters* into +*form variables*, a special kind of *request variables*. They are made available via the request object's ``form`` attribute -(a ``dict``) or directly via the request object itself, as long as they are -not hidden by other request information. +(a ``dict``). + +*Request variables* can come from various sources. If a request +variable is looked up, those sources are asked in turn +whether they know the variable; the lookup stops with the first +success. This way, a variable defined by a source +is hidden by a variable of the same name defined by a source +asked earlier. +Sources are asked in the following order: + +* the lookup for ``REQUEST`` gives the request object + +* the request attribute ``other`` -- it contains + request variables explicitly set with the method + ``request.set``. In addition it is used as + cache for special and lazy variables. Finally, the request + preprocessing puts some additional variables there, + e.g. ``PUBLISHED`` (the published object), + ``AUTHENTICATED_USER`` (the user object, if authentication was + successful), ``SERVER_URL`` (the initial URL part, identifying + the server). + +* special variables + + - the URL variables whose names are defined by the + regular expressions ``URL(PATH)?([0-9]*)`` + and ``BASE(PATH)?([0-9]*)``, e.g. ``URL``, ``URL1``, ``URLPATH``, + ``BASE``, ``BASEPATH1``. Their value is a prefix of the + current URL or the URL path (if the name contains ``PATH``), + respectively. ``URL`` and ``URL0`` give the full URL + and each successive *i* in ``URL``\ *i* removes a further + path segment from the end. ``BASE`` and ``BASE0`` start + with the empty path; ``BASE1`` adds the so called + "script name" (if any) and each successive *i* in ``BASE``\ *i* + adds a further path segment form the original URL. + + - ``BODY`` and ``BODYFILE`` (for requests with a body). + Their value is the request body, either as a + (binary) string or as a file, respectively. + +* the request attribute ``environ`` -- it contains the + CGI environment variables and other information + from the request headers. + +* the request attribute ``common`` -- it contains variables + defined by the request class (not the individual request). + +* so called *lazy variables* -- these are "expensive" + variables created only on first access and then + put into ``other``. An example is ``SESSION``, representing + Zope's session object. + +* the request attribute ``form`` -- it contains the form + variables, i.e. the result of the request parameter processing. + +* the request attribute ``cookies`` -- it contains the cookies + provided with the request. + +The object publisher can use all (visible) request variables +as arguments for the published object. -The request parameters coming from the *query* have the form +.. note:: + ``str(request)`` returns a description of the request object as HTML text. + You can use this to "view" the result of request preprocessing, e.g. + by defining + a ``DTML Method`` with body ```` + (or a ``Script (Python)`` with body ``return str(container.REQUEST)``) + and calling it via the Web or using it as form action. + +The request parameters from the *query* have the form *name*\ ``=``\ *value* and are separated by ``&``; request parameters from a request body can have different forms and can be separated in different ways dependent on the @@ -664,46 +729,32 @@ are aggregated into a single object. Zope supports both cases but it needs directives to guide the process. It uses *name* suffixes of the form ``:``\ *directive* to specify such directives. For example, the parameter ``i:int=1`` tells Zope to convert the value ``'1'`` to an -integer and use it as value for request variable ``i``; the parameter sequence +integer and use it as value for form variable ``i``; the parameter sequence ``x.name:record=Peter&x.age:int:record=10`` tells Zope to construct a record ``x`` with attributes ``name`` and ``age`` and respective values -``'Peter'`` and ``10``. - -The publisher also marshals arguments from CGI environment variables -and cookies. When locating arguments, the publisher first looks in -other (i.e. explicitly set or special) request variables, -then CGI environment variables, then form -variables, and finally cookies. Once a variable is found, no further -searching is done. So for example, if your published object expects -to be called with a form variable named ``SERVER_URL``, it will fail, -since this argument will be marshalled from the CGI environment first, -before the form data. +``'Peter'`` and ``10``. There are different kinds of directives: +converter, aggregator and encoding directives. -The publisher provides a number of additional special variables such -as ``URL``, ``URLn``, ``BASEn`` and others, which are derived from the -request. -Unfortunately, there is no current documentation for those variables. +Converters +~~~~~~~~~~ -Argument Conversion -~~~~~~~~~~~~~~~~~~~ - -The publisher supports argument conversion. For example consider this +The publisher supports argument conversion via +converter directives. For example consider this function:: def one_third(number): """returns the number divided by three""" return number / 3.0 -This function cannot be called from the web because by default the -publisher marshals arguments into strings, not numbers. This is why +Calling this function will only succeed, if *number* is a number; it +will fail for a string. This is why the publisher provides a number of converters. To signal an argument -conversion you name your form variables with a colon followed by a -type conversion code. +conversion you use a converter directive. For example, to call the above function with 66 as the argument you -can use this URL ``one_third?number:int=66``. +can use the URL ``one_third?number:int=66``. Some converters employ special logic for the conversion. For example, both ``tokens`` as well as ``lines`` convert to @@ -732,9 +783,10 @@ The publisher supports many converters: - **ustring** -- Converts a variable to a Python unicode string. - **bytes** -- Converts a variable to a Python bytes object/string. + Currently, there is no way to specify the output encoding; "latin1" + is used. -- **required** -- Raises an exception if the variable is not present or - is an empty string. +- **required** -- Raises an exception if the variable is an empty string. - **date** -- Converts a string to a **DateTime** object. The formats accepted are fairly flexible, for example ``10/16/2000``, ``12:01:13 @@ -742,7 +794,7 @@ The publisher supports many converters: - **date_international** -- Converts a string to a **DateTime** object, but especially treats ambiguous dates as "days before month before - year". This useful if you need to parse non-US dates. + year". This is useful if you need to parse non-US dates. - **lines** -- Converts a variable to a Python list of native strings by splitting the string on line breaks. Also converts list/tuple of @@ -762,75 +814,220 @@ The publisher supports many converters: The full list of supported converters can be found in ``ZPublisher.Converters.type_converters``. -If the publisher cannot coerce a request parameter into the type -required by the type converter it will raise an error. This is useful -for simple applications, but restricts your ability to tailor error -messages. If you wish to provide your own error messages, you should -convert arguments manually in your published objects rather than -relying on the publisher for coercion. +If the publisher cannot convert a request parameter into the type +required by the type converter it will raise an exception. .. note:: Client-side validation with HTML 5 and/or JavaScript may improve the usability of the application, but it is never a replacement for server side validation. -You can combine type converters to a limited extent. For example you +You can combine a type converter with other directives. For example you could create a list of integers like so:: - - - + + + Aggregators ~~~~~~~~~~~ -An aggregator directive tells Zope how to process parameters with the same or -a similar name. +Aggregator directives tell Zope how to process parameters with the same or +similar names. There are aggregators with tell Zope to +aggregate parameter values into a sequence or a record +and aggregators which mark the value for a particular use, e.g. +"to be used as default value". + +A request parameter can have several aggregator directives. +They are applied in turn from left to right. For example, +``x.a:int:list:record=1&x.a:int:list:record=2`` creates the +form variable ``x`` of type ``record`` with attribute ``a`` with +the list ``[1, 2]`` as value; +``x.a:int:record:list=1&x.a:int:record:list=2`` creates the form variable +``x``; its value is the list of two ``record``\ s of which the +``a`` attributes have the value ``1`` and ``2``, respectively. +As another example, +``x:default:list=1&x:default:list=2&x:list=3`` creates request +variable ``x`` with value ``['1', '3']`` -- the ``x:default:list=2`` +was replaced by ``x:list=3``. On the other hand, +``x:list:default=1&x:list:default=2&x:list=3`` creates +``x`` with value ``['3']`` -- processing the first two parameters +has created the default value ``['1', '2']`` which was replaced +by the non default ``['3']`` from the processing of the third +parameter. + +.. note:: + + Technically, an aggregator transforms a triple *name*, *value* + and *aggs* into another such triple (or ``None``). + *name* is the parameter name, *value* the parameter value + and *aggs* the sequence of aggregators still to apply. + Thus, an aggregator can change the parameter name, its value + and what aggregators should still be applied. + The aggregators are applied successively until *aggs* becomes + empty. The final *name* and *value* is used to "update" + the form variable collection. This "update" can be + complex, is often recursive and is affected by the types and marks + of the encountered values. + + +.. note:: + While aggregators have the purpose to aggregate + (in the sense of coordinate) several parameters with + similar names into a single form variable, they do not + perform this aggregation themselves. Instead, they + produce a wrapped value representing an isolated + parameter. The wrapped value uses appropriate types and marks + to achieve the desired aggregation when the final *name*, *value* + (after the application of all aggrgators) updates the form + variable collection. + During this update, the form variable collection + representing the result of the aggregation of the previously + processed parameters is recursively updated with the information + for the current parameter. In this process, *target* subvalues from + the collection are *updated* with corresponding + *source* subvalues from the current parameter. + When we use the terms "updating", "target" and "source" below, + we reference this recursive subvalue update. + +Sequence aggregators +++++++++++++++++++++ + +All sequence aggregators produce a sequences value -- typically +with a single element (exception **empty**: its result value +has no elements). For some sequence aggregators, the input +value must already have been a sequence. + +- **list** -- Wrap *value* into a ``list`` sequence. This is typically used to + collect all parameters with the same name into a list. + + "Updating" a target sequence with a source sequence requires that + the source sequence has a single element, *source_value*. + If the source sequence is marked as "to be used in **append mode**", + then *source_value* is appended to the target sequence. + Otherwise (the default), it is tried to "update or replace" + the last component of the target sequence (if any) with *source_value*; + should this fail, *source_value* is appended to the target list. + + .. note:: + + If there are two or more simple (i.e. top level and not structured) + parameters with + the same name they are by default collected into an + implicitly constructed list. + For simple parameters, the **list** aggregator is mainly used to ensure + that the parameter leads to a list value even in the case that + there is only one of them. + +- **tuple** -- Wrap *value* into a tuple. Otherwise, it works like **list**. + +- **empty** -- Transform *value* (a sequence) into an empty sequence. + + An empty sequence can be useful e.g. as default value for + a multi select control. + +- **append** -- Mark *value* (a sequence) as "to be used in **append mode**". + + **append** is used to force that the source value is appended + to the target sequence. Without + the **append**, the value might instead be used to "update or replace" + the last element of the target sequence. + +Record aggregator ++++++++++++++++++ + +The record aggregator **record** requires that *name* contains ``.`` +and splits it at the last ``.`` into *var*\ ``.``\ *attr*. +It returns as name *var* and as value the record with attribute *attr* with +value *value*. + +**record** is typically used to aggregate the parameters whose +name starts with *var.* into a single record variable *var*. + +"Updating" a target record with a source record requires that +the source record has a single attribute *attr*; denote its value by +*attr_value*. If the target record still lacks the attribute *attr*, +add it with value *attr_value*; otherwise, try to "update or replace" +its value with *attr_value*; if this fails, the updating fails. + +A related aggregator is **records**. **records** is actually +a synonym for the aggregator sequence **record** **list**. + + +Value marking aggregators ++++++++++++++++++++++++++ + +These aggregators mark their value as +to be used in a special way. A value can have at most one +mark. A value without mark is called a "normal" value. + +Zope supports the following marking aggregators: + +- **default** -- mark as a default value. + + "Updating" a target default value with a "normal" (source) value + replaces the target value. "Updating" with another default value + fails. + + This means: + a default value can be replaced by a following "normal" + value but not by another default value. + +- **conditional** -- mark as a conditional value. -Zope supports the following aggregators: + "Updating" with a conditional source value has no effect + if there is already a target value; "Updating" + a conditional target value behaves identically to + the update of a default value. -- **list** -- collect all values with this name into a list. - If there are two or more parameters with the same name - they are collected into a list by default. - The ``list`` aggregator is mainly used to ensure that - the parameter leads to a list value even in the case that - there is only one of them. + This means: + A conditional value is ignored if there exists already a value; + otherwise, it behaves like a default + value. -- **tuple** -- collect all values with this name into a tuple. + .. note: -- **default** -- use the value of this parameter as a default value; it - can be overridden by a parameter of the same name without - the ``default`` directive. + **conditional**, like **default**, indicates some kind of + default value. With **conditional**, the default can come + before or after the "normal" value; with **default** it + must come before the "normal" value (if any). -- **record** -- this directive assumes that the parameter name starts - with *var*\ ``.``\ *attr*. - It tells Zope to create a request variable *var* of type record - (more precisely, a ``ZPublisher.HTTPRequest.record`` instance) and - set its attribute *attr* to the parameter value. - If such a request variable already exists, - then only its attribute *attr* is updated. + The use of **conditional** can be indicated e.g. for + default values from button controls: visual aspects + can prevent you to put a button before another control -- **records** -- this directive is similar to ``record``. However, *var* - gets as value not a single record but a list of records. - Zope starts a new record (and appends it to the list) - when the current request parameter would override an attribute - in the last record of the list constructed so far (or this list - is empty). +- **replace** -- mark as a replacement value. + + "Updating" with a source replacement value always replaces + the target value. "Updating" a target replacement value + behaves like updating a normal value. + + This means: + a replacement value unconditionally replaces an existing value. + After the replacement, it behaves like a normal value. + + A replacement value replaces an implicitly constructed sequence + as a whole. + +Miscellaneous aggregators ++++++++++++++++++++++++++ - **ignore_empty** -- this directive causes Zope to ignore the parameter if its value is empty. -An aggregator in detail: the `record` argument -++++++++++++++++++++++++++++++++++++++++++++++ + +Detailed examples ++++++++++++++++++ Sometimes you may wish to consolidate form data into a structure rather than pass arguments individually. **Record arguments** allow you to do this. -The ``record`` type converter allows you to combine multiple form -variables into a single input variable. For example:: +The **record** directive allows you to combine the values +of multiple form controls +into a single form variable. For example:: @@ -839,7 +1036,7 @@ variables into a single input variable. For example:: This form will result in a single variable, ``date``, with the attributes ``year``, ``month``, and ``day``. -You can skip empty record elements with the ``ignore_empty`` converter. +You can skip empty record elements with the **ignore_empty** directive. For example:: @@ -850,12 +1047,12 @@ record ``person`` is returned it will not have an ``email`` attribute if the user did not enter one. You can also provide default values for record elements with the -``default`` converter. For example:: +**default** directive. For example:: - @@ -863,7 +1060,7 @@ You can also provide default values for record elements with the