Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flatgeobuf + JSON + streaming renderers #89

Closed
wants to merge 27 commits into from

Conversation

why-not-try-calmer
Copy link
Contributor

@why-not-try-calmer why-not-try-calmer commented Aug 10, 2023

  • Flatgeobuf streaming renderer
  • JSON streaming renderer
  • unit tests

I didn't use GDAL directly. I found the complexity of the installation procedure, the lack of documentation and the use of string literals in several places daunting. I didn't know fiona existed and I think it's very nice!

Please note that I have not tested this in QGIS.

@why-not-try-calmer why-not-try-calmer changed the title removed unrelated docstring Flatgeobuf renderer Aug 10, 2023
@why-not-try-calmer why-not-try-calmer force-pushed the flatgeobuf-renderer branch 2 times, most recently from 66e7a6d to 15d0b1b Compare August 12, 2023 11:15
@why-not-try-calmer why-not-try-calmer force-pushed the flatgeobuf-renderer branch 2 times, most recently from 488f219 to f9e5fd8 Compare September 27, 2023 22:39
@why-not-try-calmer why-not-try-calmer changed the title Flatgeobuf renderer Flatgeobuf + JSON streaming renderers Sep 29, 2023
@why-not-try-calmer why-not-try-calmer force-pushed the flatgeobuf-renderer branch 2 times, most recently from b4752fb to a8a145e Compare September 29, 2023 08:56
@github-actions
Copy link

Oh no! Conformance tests got worse!

@why-not-try-calmer
Copy link
Contributor Author

why-not-try-calmer commented Sep 29, 2023

@3nids Hello! How do you recommend we run unit tests in core and other django apps under the recently revamped module structure?

@3nids
Copy link
Member

3nids commented Sep 29, 2023

signalo source has been removed/renamed to tests, better put everything there.

@why-not-try-calmer why-not-try-calmer changed the base branch from main to restoring-unit-tests September 29, 2023 13:13
@why-not-try-calmer
Copy link
Contributor Author

why-not-try-calmer commented Oct 6, 2023

@3nids:

test_roads_json: 29.27
test_roads_json_streaming: 30.05
test_roads_orjson: 28.52
test_roads_ujson: 28.52
test_roads_fgb_streaming: 30.62

after

docker compose exec django python manage.py test tests.test_core.TestJSON
docker compose exec django python manage.py test tests.test_core.TestFGB

It's fine for me if we close this PR since:

  • rebasing is complicated
  • so is merging
  • results are not impressive

Otherwise I've pushed https://github.com/opengisch/django-ogcapif/tree/custom-renderers if you want to add these renderers.

Good thing is that we so much static data (= written much less often than read) , there's probably room for improvements with caching, especially if we warm the cache when starting the app.

@why-not-try-calmer why-not-try-calmer added the wontfix This will not be worked on label Oct 6, 2023
@why-not-try-calmer why-not-try-calmer changed the base branch from restoring-unit-tests to main October 6, 2023 12:15
@why-not-try-calmer why-not-try-calmer changed the title Flatgeobuf + JSON streaming renderers Flatgeobuf + JSON + streaming renderers Oct 7, 2023
why-not-try-calmer added a commit that referenced this pull request Oct 7, 2023
buffer_wrapper,
mode="w",
driver="FlatGeobuf",
schema=renderer_context["schema"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the model, as requested

def render(self, data: OrderedDict, accepted_media_type=None, renderer_context=None) -> Generator[Any, None, None]:
"""Renders JSON encoded stream."""
features_data = data["features"] if "features" in data else data["results"]["features"]
return json_generator(feature for feature in features_data)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterator object so that the response's rendered content can be streamed from the renderer. Using the default JSON renderer and / or the naked StreamingHttps response (naked as in: "without passing this constructor an iterator object") doesn't provide actual streaming.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case this PR remains open (or the other branch).

What is the type of data["results"]["features"]? If it is a list, wouldn't it be great to try to preserve it as a db cursor until this stage so we really experience the benefit of the streaming.

Otherwise if it is a list, we iterate over multiple times.

Also, I am afraid the datastructure returned by this response is just an array of features, while the default json renderer returns an object (as it is in the specs?).

Copy link
Contributor Author

@why-not-try-calmer why-not-try-calmer Oct 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data argument of this function is the serialized data. So whatever the type of data["results"]["features"] is, it is a serialized type, constructed after closing/consuming the cursor. I think there's an advantage in this; if you let the user consuming the response get a hold of the database cursor, you are making the cursor a hostage of the connection. This requires some extra complexity (i.e. managing different sorts of exceptions differently depending on their nature) which I didn't think was worth it for the sake of this prototype.

What is "this response"? json_generator? It can be seen here https://github.com/wlatanowicz/json-stream-generator/blob/0a3aa95acca188f2587da3b6f5c4915d49b0ee39/json_stream_generator/__init__.py#L19. If you mean the Django / DRF response objects, I've just added a bit that makes the construction of these objects more explicit (and I hope more correct with respect to the goals).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data argument of this function is the serialized data. So whatever the type of data["results"]["features"] is, it is a serialized type, constructed after closing/consuming the cursor. I think there's an advantage in this; if you let the user consuming the response get a hold of the database cursor, you are making the cursor a hostage of the connection. This requires some extra complexity (i.e. managing different sorts of exceptions differently depending on their nature) which I didn't think was worth it for the sake of this prototype.

Agree this brings additional complexity we might not need for now. Just worth mentioning (and documenting?) one of the main benefits of extremely big responses is not to fit everything in the memory.

What is "this response"? json_generator? It can be seen here wlatanowicz/json-stream-generator@0a3aa95/json_stream_generator/init.py#L19. If you mean the Django / DRF response objects, I've just added a bit that makes the construction of these objects more explicit (and I hope more correct with respect to the goals).

Nope, I meant with any serializer in serializers.py it seems they are returning only a list of features. Isn't it expected to return a dict having one of the values a list of features?

Copy link
Contributor Author

@why-not-try-calmer why-not-try-calmer Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the unit tests succeed, since our custom renderers require a dict as first arg, and since this arg = the value of serializer.data, where serializer is any instance of a Serializer, I believe it is a dict (probably an OrderedDict, seeing from rest_framework's code).

Apart from the issue of allocating the entire collection in memory, is there something that you worry about? Above you mentioned that the collection would be traversed more than once, but I am not sure. I have not audited the call tree, but I think there's nothing out of the ordinary in the way this is done here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# IMO good
(django-ogcapif) *[72b7284][~/work/opengisch/django-ogcapif]$ curl http://0.0.0.0:7180/oapif/collections/tests.point_2056_10fields/items | jq | head 
{
  "links": [
    {
      "type": "application/geo+json",
      "rel": "next",
      "title": "items (next)",
      "href": "http://0.0.0.0:7180/oapif/collections/tests.point_2056_10fields/items?limit=1000&offset=1000"
    },
    {
      "type": "application/geo+json",

# IMO good
(django-ogcapif) *[72b7284][~/work/opengisch/django-ogcapif]$ curl http://0.0.0.0:7180/oapif/collections/tests.point_2056_10fields/items\?format\=json\&encoder\=ujson | jq | head
{
  "links": [
    {
      "type": "application/geo+json",
      "rel": "next",
      "title": "items (next)",
      "href": "http://0.0.0.0:7180/oapif/collections/tests.point_2056_10fields/items?encoder=ujson&format=json&limit=1000&offset=1000"
    },
    {
      "type": "application/geo+json",

# IMO not good
(django-ogcapif) *[72b7284][~/work/opengisch/django-ogcapif]$ curl http://0.0.0.0:7180/oapif/collections/tests.point_2056_10fields/items\?format\=json\&streaming\=true | jq | head 
[
  {
    "id": "30ea6bb0-467f-4f26-9f11-c096fbd70db5",
    "type": "Feature",
    "geometry": {
      "bbox": [
        2508500,
        1152000,
        2508500,
        1152000

I believe the third call is a different JSON structure, it is part of the specs, or I am missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point. How would you configure a StreamingHttpResponse in such a way that it produces a dict? Iterators by themselves cannot produce dicts so I wonder.

Copy link
Collaborator

@suricactus suricactus Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is problematic, there are two things to be done about unit tests:

  1. we need to fix the unit tests so they cover this scenario, as returning an array is not part of the OpenAPI specification AFAIK.
  2. we should not entirely rely on unit tests. Checking the produced endpoints output manually after major low-level change is never a bad idea.

It's a good point. How would you configure a StreamingHttpResponse in such a way that it produces a dict?

I don't think the problem is in StreamingHttpResponse, as it just serves bytes, whenever they are parts of an image, csv rows or part of JSON document.

Iterators by themselves cannot produce dicts so I wonder.

Also IMO there is no specific limitation with iterators, it is more that JSONStreamingRenderer might only support lists for encoding, didn't check their code yet. If you need help, we can have a look at the JSONStreamingRenderer library code together.
For example in my (very slow) branch I have the stream to spit key/value pairs. main...streaming

Copy link
Contributor Author

@why-not-try-calmer why-not-try-calmer Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I know see what you meant, my bad, I was looking at the wrong thing. Yes, none of the renderers provided in this PR wraps the results in this "metadata" dict that you have seen returned by the standard renderers. Instead they just return the results as a list. I must admit I didn't pay much attention to conformance because Denis was mostly interested in timing the request-response round-trip. Or so I understood. If there is more for me to do regarding conformance, happy to do so, but I am not sure what the fate of the prototype is (I am not sure if we are still using it as a test bench or if we want to make everything clean and become a reference implementation).

But if we want to make things clean and become a reference implementation, we probably need to talk about conformance, not just about "toy" renderers like these, but about the entire JSON API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants