Merge pull request #839 from neo4j-contrib/rc/5.4.0

Rc/5.4.0
neo4j-contrib · Nov 7, 2024 · d9eaf90 · d9eaf90
2 parents 38ba23d + d7747e7
commit d9eaf90
Show file tree

Hide file tree

Showing 35 changed files with 3,027 additions and 1,016 deletions.
diff --git a/.github/workflows/integration-tests.yml b/.github/workflows/integration-tests.yml
@@ -15,7 +15,7 @@ jobs:
     strategy:
       fail-fast: false
       matrix:
-        python-version: ["3.12", "3.11", "3.10", "3.9", "3.8", "3.7"]
+        python-version: ["3.13", "3.12", "3.11", "3.10", "3.9"]
         neo4j-version: ["community", "enterprise", "5.5-enterprise", "4.4-enterprise", "4.4-community"]
 
     steps:

diff --git a/Changelog b/Changelog
@@ -1,3 +1,15 @@
+Version 5.4.0 2024-11
+* Traversal option for filtering and ordering
+* Insert raw Cypher for ordering
+* Possibility to traverse relations, only returning the last element of the path
+* Resolve the results of complex queries as a nested subgraph
+* Possibility to transform variables, with aggregations methods : Collect() and Last()
+* Intermediate transform, for example to order variables before collecting
+* Subqueries (Cypher CALL{} clause)
+* Allow JSONProperty to actually use non-ascii elements. Thanks to @danikirish
+* Bumped neo4j (driver) to 5.26.0
+* Special huge thanks to @tonioo for this release
+
 Version 5.3.3 2024-09
 * Fixes vector index doc and test
 

diff --git a/README.md b/README.md
@@ -24,7 +24,7 @@ GitHub repo found at <https://github.com/neo4j-contrib/neomodel/>.
 
 **For neomodel releases 5.x :**
 
--   Python 3.7+
+-   Python 3.8+
 -   Neo4j 5.x, 4.4 (LTS)
 
 **For neomodel releases 4.x :**
@@ -37,6 +37,14 @@ GitHub repo found at <https://github.com/neo4j-contrib/neomodel/>.
 Available on
 [readthedocs](http://neomodel.readthedocs.org).
 
+# New in 5.4.0
+
+This version adds many new features, expanding neomodel's querying capabilities. Those features were kindly contributed back by the [OpenStudyBuilder team](https://openstudybuilder.com/). A VERY special thanks to @tonioo for the integration work.
+
+There are too many new capabilities here, so I advise you to start by looking at the full summary example in the [Getting Started guide](https://neomodel.readthedocs.io/en/latest/getting_started.html#full-example). It will then point you to the various relevant sections.
+
+We also validated support for [Python 3.13](https://docs.python.org/3/whatsnew/3.13.html).
+
 # New in 5.3.0
 
 neomodel now supports asynchronous programming, thanks to the [Neo4j driver async API](https://neo4j.com/docs/api/python-driver/current/async_api.html). The [documentation](http://neomodel.readthedocs.org) has been updated accordingly, with an updated getting started section, and some specific documentation for the async API.
@@ -96,7 +104,7 @@ Ensure `dbms.security.auth_enabled=true` in your database configuration
 file. Setup a virtual environment, install neomodel for development and
 run the test suite: :
 
-    $ pip install -e '.[dev,pandas,numpy]'
+    $ pip install -r requirements-dev.txt
     $ pytest
 
 The tests in \"test_connection.py\" will fail locally if you don\'t

diff --git a/doc/source/advanced_query_operations.rst b/doc/source/advanced_query_operations.rst
@@ -0,0 +1,111 @@
+.. _Advanced query operations:
+
+=========================
+Advanced query operations
+=========================
+
+neomodel provides ways to enhance your queries beyond filtering and traversals.
+
+Annotate - Aliasing
+-------------------
+
+The `annotate` method allows you to add transformations to your elements. To learn more about the available transformations, keep reading this section.
+
+Aggregations
+------------
+
+neomodel implements some of the aggregation methods available in Cypher:
+
+- Collect (with distinct option)
+- Last
+
+These are usable in this way::
+
+    from neomodel.sync_.match import Collect, Last
+
+    # distinct is optional, and defaults to False. When true, objects are deduplicated
+    Supplier.nodes.traverse_relations(available_species="coffees__species")
+        .annotate(Collect("available_species", distinct=True))
+        .all()
+
+    # Last is used to get the last element of a list
+    Supplier.nodes.traverse_relations(available_species="coffees__species")
+        .annotate(Last(Collect("last_species")))
+        .all()
+
+Note how `annotate` is used to add the aggregation method to the query.
+
+.. note::
+    Using the Last() method right after a Collect() without having set an ordering will return the last element in the list as it was returned by the database.
+
+    This is probably not what you want ; which means you must provide an explicit ordering. To do so, you cannot use neomodel's `order_by` method, but need an intermediate transformation step (see below).
+
+    This is because the order_by method adds ordering as the very last step of the Cypher query ; whereas in the present example, you want to first order Species, then get the last one, and then finally return your results. In other words, you need an intermediate WITH Cypher clause.
+
+Intermediate transformations
+----------------------------
+
+The `intermediate_transform` method basically allows you to add a WITH clause to your query. This is useful when you need to perform some operations on your results before returning them.
+
+As discussed in the note above, this is for example useful when you need to order your results before applying an aggregation method, like so::
+
+    from neomodel.sync_.match import Collect, Last
+
+    # This will return all Coffee nodes, with their most expensive supplier
+    Coffee.nodes.traverse_relations(suppliers="suppliers")
+        .intermediate_transform(
+            {"suppliers": "suppliers"}, ordering=["suppliers.delivery_cost"]
+        )
+        .annotate(supps=Last(Collect("suppliers")))
+
+Subqueries
+----------
+
+The `subquery` method allows you to perform a `Cypher subquery <https://neo4j.com/docs/cypher-manual/current/subqueries/call-subquery/>`_ inside your query. This allows you to perform operations in isolation to the rest of your query::
+
+    from neomodel.sync_match import Collect, Last
+    
+    # This will create a CALL{} subquery
+    # And return a variable named supps usable in the rest of your query
+    Coffee.nodes.filter(name="Espresso")
+    .subquery(
+        Coffee.nodes.traverse_relations(suppliers="suppliers")
+        .intermediate_transform(
+            {"suppliers": "suppliers"}, ordering=["suppliers.delivery_cost"]
+        )
+        .annotate(supps=Last(Collect("suppliers"))),
+        ["supps"],
+    )
+
+.. note::
+    Notice the subquery starts with Coffee.nodes ; neomodel will use this to know it needs to inject the source "coffee" variable generated by the outer query into the subquery. This means only Espresso coffee nodes will be considered in the subquery.
+
+    We know this is confusing to read, but have not found a better wat to do this yet. If you have any suggestions, please let us know.
+
+Helpers
+-------
+
+Reading the sections above, you may have noticed that we used explicit aliasing in the examples, as in::
+
+    traverse_relations(suppliers="suppliers")
+
+This allows you to reference the generated Cypher variables in your transformation steps, for example::
+
+    traverse_relations(suppliers="suppliers").annotate(Collect("suppliers"))
+
+In some cases though, it is not possible to set explicit aliases, for example when using `fetch_relations`. In these cases, neomodel provides `resolver` methods, so you do not have to guess the name of the variable in the generated Cypher. Those are `NodeNameResolver` and `RelationshipNameResolver`. For example::
+
+    from neomodel.sync_match import Collect, NodeNameResolver, RelationshipNameResolver
+
+    Supplier.nodes.fetch_relations("coffees__species")
+        .annotate(
+            all_species=Collect(NodeNameResolver("coffees__species"), distinct=True),
+            all_species_rels=Collect(
+                RelationNameResolver("coffees__species"), distinct=True
+            ),
+        )
+        .all()
+
+.. note:: 
+
+    When using the resolvers in combination with a traversal as in the example above, it will resolve the variable name of the last element in the traversal - the Species node for NodeNameResolver, and Coffee--Species relationship for RelationshipNameResolver.
diff --git a/doc/source/configuration.rst b/doc/source/configuration.rst
@@ -32,7 +32,7 @@ Adjust driver configuration - these options are only available for this connecti
     config.MAX_TRANSACTION_RETRY_TIME = 30.0  # default
     config.RESOLVER = None  # default
     config.TRUST = neo4j.TRUST_SYSTEM_CA_SIGNED_CERTIFICATES  # default
-    config.USER_AGENT = neomodel/v5.3.3  # default
+    config.USER_AGENT = neomodel/v5.4.0  # default
 
 Setting the database name, if different from the default one::
 

diff --git a/doc/source/cypher.rst b/doc/source/cypher.rst
@@ -24,6 +24,18 @@ Outside of a `StructuredNode`::
 
 The ``resolve_objects`` parameter automatically inflates the returned nodes to their defined classes (this is turned **off** by default). See :ref:`automatic_class_resolution` for details and possible pitfalls.
 
+You can also retrieve a whole path of already instantiated objects corresponding to 
+the nodes and relationship classes with a single query::
+
+    q = db.cypher_query("MATCH p=(:CityOfResidence)<-[:LIVES_IN]-(:PersonOfInterest)-[:IS_FROM]->(:CountryOfOrigin) RETURN p LIMIT 1", 
+                        resolve_objects = True)
+
+Notice here that ``resolve_objects`` is set to ``True``. This results in ``q`` being a 
+list of ``result, result_name`` and ``q[0][0][0]`` being a ``NeomodelPath`` object.
+
+``NeomodelPath`` ``nodes, relationships`` attributes contain already instantiated objects of the 
+nodes and relationships in the query, *in order of appearance*.
+
 Integrations
 ============
 

diff --git a/doc/source/filtering_ordering.rst b/doc/source/filtering_ordering.rst
@@ -0,0 +1,199 @@
+.. _Filtering and ordering:
+
+======================
+Filtering and ordering
+======================
+
+For the examples in this section, we will be using the following model::
+
+    class SupplierRel(StructuredRel):
+        since = DateTimeProperty(default=datetime.now)
+
+
+    class Supplier(StructuredNode):
+        name = StringProperty()
+        delivery_cost = IntegerProperty()
+
+
+    class Coffee(StructuredNode):
+        name = StringProperty(unique_index=True)
+        price = IntegerProperty()
+        suppliers = RelationshipFrom(Supplier, 'SUPPLIES', model=SupplierRel)
+
+Filtering
+=========
+
+neomodel allows filtering on nodes' and relationships' properties. Filters can be combined using Django's Q syntax. It also allows multi-hop relationship traversals to filter on "remote" elements.
+
+Filter methods
+--------------
+
+The ``.nodes`` property of a class returns all nodes of that type from the database.
+
+This set (called `NodeSet`) can be iterated over and filtered on, using the `.filter` method::
+
+    # nodes with label Coffee whose price is greater than 2
+    high_end_coffees = Coffee.nodes.filter(price__gt=2)
+
+    try:
+        java = Coffee.nodes.get(name='Java')
+    except DoesNotExist:
+        # .filter will not throw an exception if no results are found
+        # but .get will
+        print("Couldn't find coffee 'Java'")
+
+The filter method borrows the same Django filter format with double underscore prefixed operators:
+
+- lt - less than
+- gt - greater than
+- lte - less than or equal to
+- gte - greater than or equal to
+- ne - not equal
+- in - item in list
+- isnull - `True` IS NULL, `False` IS NOT NULL
+- exact - string equals
+- iexact - string equals, case insensitive
+- contains - contains string value
+- icontains - contains string value, case insensitive
+- startswith - starts with string value
+- istartswith - starts with string value, case insensitive
+- endswith - ends with string value
+- iendswith - ends with string value, case insensitive
+- regex - matches a regex expression
+- iregex - matches a regex expression, case insensitive
+
+These operators work with both `.get` and `.filter` methods.
+
+Combining filters
+-----------------
+
+The filter method allows you to combine multiple filters::
+
+    cheap_arabicas = Coffee.nodes.filter(price__lt=5, name__icontains='arabica')
+
+These filters are combined using the logical AND operator. To execute more complex logic (for example, queries with OR statements), `Q objects <neomodel.Q>` can be used. This is borrowed from Django.
+
+``Q`` objects can be combined using the ``&`` and ``|`` operators. Statements of arbitrary complexity can be composed by combining ``Q`` objects
+with the ``&`` and ``|`` operators and use parenthetical grouping. Also, ``Q``
+objects can be negated using the ``~`` operator, allowing for combined lookups
+that combine both a normal query and a negated (``NOT``) query::
+
+    Q(name__icontains='arabica') | ~Q(name__endswith='blend')
+
+Chaining ``Q`` objects will join them as an AND clause::
+
+    not_middle_priced_arabicas = Coffee.nodes.filter(
+        Q(name__icontains='arabica'),
+        Q(price__lt=5) | Q(price__gt=10)
+    )
+
+Traversals and filtering
+------------------------
+
+Sometimes you need to filter nodes based on other nodes they are connected to. This can be done by including a traversal in the `filter` method. ::
+
+    # Find all suppliers of coffee 'Java' who have been supplying since 2007
+    # But whose prices are greater than 5
+    since_date = datetime(2007, 1, 1)
+    java_old_timers = Coffee.nodes.filter(
+            name='Java',
+            suppliers__delivery_cost__gt=5,
+            **{"suppliers|since__lt": since_date}
+        )
+
+In the example above, note the following syntax elements:
+
+- The name of relationships as defined in the `StructuredNode` class is used to traverse relationships. `suppliers` in this example.
+- Double underscore `__` is used to target a property of a node. `delivery_cost` in this example.
+- A pipe `|` is used to separate the relationship traversal from the property filter. The filter also has to included in a `**kwargs` dictionary, because the pipe character would break the syntax. This is a special syntax to indicate that the filter is on the relationship itself, not on the node at the end of the relationship.
+- The filter operators like lt, gt, etc. can be used on the filtered property.
+
+Traversals can be of any length, with each relationships separated by a double underscore `__`, for example::
+
+    # country is here a relationship between Supplier and Country
+    Coffee.nodes.filter(suppliers__country__name='Brazil')
+
+Enforcing relationship/path existence
+-------------------------------------
+
+The `has` method checks for existence of (one or more) relationships, in this case it returns a set of `Coffee` nodes which have a supplier::
+
+    Coffee.nodes.has(suppliers=True)
+
+This can be negated by setting `suppliers=False`, to find `Coffee` nodes without `suppliers`.
+
+You can also filter on the existence of more complex traversals by using the `traverse_relations` method. See :ref:`Path traversal`.
+
+Ordering
+========
+
+neomodel allows ordering by nodes' and relationships' properties. Order can be ascending or descending. Is also allows multi-hop relationship traversals to order on "remote" elements. Finally, you can inject raw Cypher clauses to have full control over ordering when necessary.
+
+order_by
+--------
+
+Ordering results by a particular property is done via the `order_by` method::
+
+    # Ascending sort
+    for coffee in Coffee.nodes.order_by('price'):
+        print(coffee, coffee.price)
+
+    # Descending sort
+    for supplier in Supplier.nodes.order_by('-delivery_cost'):
+        print(supplier, supplier.delivery_cost)
+
+
+Removing the ordering from a previously defined query, is done by passing `None` to `order_by`::
+
+    # Sort in descending order
+    suppliers = Supplier.nodes.order_by('-delivery_cost')
+
+    # Don't order; yield nodes in the order neo4j returns them
+    suppliers = suppliers.order_by(None)
+
+For random ordering simply pass '?' to the order_by method::
+
+    Coffee.nodes.order_by('?')
+
+Traversals and ordering
+-----------------------
+
+Sometimes you need to order results based on properties situated on different nodes or relationships. This can be done by including a traversal in the `order_by` method. ::
+
+    # Find the most expensive coffee to deliver
+    # Then order by the date the supplier started supplying
+    Coffee.nodes.order_by(
+        '-suppliers__delivery_cost',
+        'suppliers|since',
+    )
+
+In the example above, note the following syntax elements:
+
+- The name of relationships as defined in the `StructuredNode` class is used to traverse relationships. `suppliers` in this example.
+- Double underscore `__` is used to target a property of a node. `delivery_cost` in this example.
+- A pipe `|` is used to separate the relationship traversal from the property filter. This is a special syntax to indicate that the filter is on the relationship itself, not on the node at the end of the relationship.
+
+Traversals can be of any length, with each relationships separated by a double underscore `__`, for example::
+
+    # country is here a relationship between Supplier and Country
+    Coffee.nodes.order_by('suppliers__country__latitude')
+
+RawCypher
+---------
+
+When you need more advanced ordering capabilities, for example to apply order to a transformed property, you can use the `RawCypher` method, like so::
+
+    from neomodel.sync_.match import RawCypher
+
+    class SoftwareDependency(AsyncStructuredNode):
+        name = StringProperty()
+        version = StringProperty()
+
+    SoftwareDependency(name="Package2", version="1.4.0").save()
+    SoftwareDependency(name="Package3", version="2.5.5").save()
+
+    latest_dep = SoftwareDependency.nodes.order_by(
+        RawCypher("toInteger(split($n.version, '.')[0]) DESC"),
+    )
+
+In the example above, note the `$n` placeholder in the `RawCypher` clause. This is a placeholder for the node being ordered (`SoftwareDependency` in this case).