-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flatgeobuf + JSON + streaming renderers #89
Conversation
64b7fde
to
1545498
Compare
66e7a6d
to
15d0b1b
Compare
488f219
to
f9e5fd8
Compare
b4752fb
to
a8a145e
Compare
a8a145e
to
6740a58
Compare
Oh no! Conformance tests got worse! |
0372d2e
to
b792cfa
Compare
@3nids Hello! How do you recommend we run unit tests in |
signalo source has been removed/renamed to tests, better put everything there. |
after
It's fine for me if we close this PR since:
Otherwise I've pushed https://github.com/opengisch/django-ogcapif/tree/custom-renderers if you want to add these renderers. Good thing is that we so much static data (= written much less often than read) , there's probably room for improvements with caching, especially if we warm the cache when starting the app. |
1f94d4b
to
c608223
Compare
… to be completely unnecessary; numbers don't change much however
9e2d90e
to
632e18e
Compare
buffer_wrapper, | ||
mode="w", | ||
driver="FlatGeobuf", | ||
schema=renderer_context["schema"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the model, as requested
def render(self, data: OrderedDict, accepted_media_type=None, renderer_context=None) -> Generator[Any, None, None]: | ||
"""Renders JSON encoded stream.""" | ||
features_data = data["features"] if "features" in data else data["results"]["features"] | ||
return json_generator(feature for feature in features_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iterator object so that the response's rendered content can be streamed from the renderer. Using the default JSON renderer and / or the naked StreamingHttps response (naked as in: "without passing this constructor an iterator object") doesn't provide actual streaming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case this PR remains open (or the other branch).
What is the type of data["results"]["features"]
? If it is a list, wouldn't it be great to try to preserve it as a db cursor until this stage so we really experience the benefit of the streaming.
Otherwise if it is a list, we iterate over multiple times.
Also, I am afraid the datastructure returned by this response is just an array of features, while the default json renderer returns an object (as it is in the specs?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data
argument of this function is the serialized data. So whatever the type of data["results"]["features"]
is, it is a serialized type, constructed after closing/consuming the cursor. I think there's an advantage in this; if you let the user consuming the response get a hold of the database cursor, you are making the cursor a hostage of the connection. This requires some extra complexity (i.e. managing different sorts of exceptions differently depending on their nature) which I didn't think was worth it for the sake of this prototype.
What is "this response"? json_generator
? It can be seen here https://github.com/wlatanowicz/json-stream-generator/blob/0a3aa95acca188f2587da3b6f5c4915d49b0ee39/json_stream_generator/__init__.py#L19. If you mean the Django / DRF response objects, I've just added a bit that makes the construction of these objects more explicit (and I hope more correct with respect to the goals).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data argument of this function is the serialized data. So whatever the type of data["results"]["features"] is, it is a serialized type, constructed after closing/consuming the cursor. I think there's an advantage in this; if you let the user consuming the response get a hold of the database cursor, you are making the cursor a hostage of the connection. This requires some extra complexity (i.e. managing different sorts of exceptions differently depending on their nature) which I didn't think was worth it for the sake of this prototype.
Agree this brings additional complexity we might not need for now. Just worth mentioning (and documenting?) one of the main benefits of extremely big responses is not to fit everything in the memory.
What is "this response"? json_generator? It can be seen here wlatanowicz/json-stream-generator@0a3aa95/json_stream_generator/init.py#L19. If you mean the Django / DRF response objects, I've just added a bit that makes the construction of these objects more explicit (and I hope more correct with respect to the goals).
Nope, I meant with any serializer in serializers.py
it seems they are returning only a list
of features. Isn't it expected to return a dict
having one of the values a list of features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the unit tests succeed, since our custom renderers require a dict as first arg, and since this arg = the value of serializer.data
, where serializer
is any instance of a Serializer, I believe it is a dict
(probably an OrderedDict, seeing from rest_framework's code).
Apart from the issue of allocating the entire collection in memory, is there something that you worry about? Above you mentioned that the collection would be traversed more than once, but I am not sure. I have not audited the call tree, but I think there's nothing out of the ordinary in the way this is done here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# IMO good
(django-ogcapif) *[72b7284][~/work/opengisch/django-ogcapif]$ curl http://0.0.0.0:7180/oapif/collections/tests.point_2056_10fields/items | jq | head
{
"links": [
{
"type": "application/geo+json",
"rel": "next",
"title": "items (next)",
"href": "http://0.0.0.0:7180/oapif/collections/tests.point_2056_10fields/items?limit=1000&offset=1000"
},
{
"type": "application/geo+json",
# IMO good
(django-ogcapif) *[72b7284][~/work/opengisch/django-ogcapif]$ curl http://0.0.0.0:7180/oapif/collections/tests.point_2056_10fields/items\?format\=json\&encoder\=ujson | jq | head
{
"links": [
{
"type": "application/geo+json",
"rel": "next",
"title": "items (next)",
"href": "http://0.0.0.0:7180/oapif/collections/tests.point_2056_10fields/items?encoder=ujson&format=json&limit=1000&offset=1000"
},
{
"type": "application/geo+json",
# IMO not good
(django-ogcapif) *[72b7284][~/work/opengisch/django-ogcapif]$ curl http://0.0.0.0:7180/oapif/collections/tests.point_2056_10fields/items\?format\=json\&streaming\=true | jq | head
[
{
"id": "30ea6bb0-467f-4f26-9f11-c096fbd70db5",
"type": "Feature",
"geometry": {
"bbox": [
2508500,
1152000,
2508500,
1152000
I believe the third call is a different JSON structure, it is part of the specs, or I am missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good point. How would you configure a StreamingHttpResponse in such a way that it produces a dict? Iterators by themselves cannot produce dicts so I wonder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is problematic, there are two things to be done about unit tests:
- we need to fix the unit tests so they cover this scenario, as returning an array is not part of the OpenAPI specification AFAIK.
- we should not entirely rely on unit tests. Checking the produced endpoints output manually after major low-level change is never a bad idea.
It's a good point. How would you configure a StreamingHttpResponse in such a way that it produces a dict?
I don't think the problem is in StreamingHttpResponse
, as it just serves bytes, whenever they are parts of an image, csv rows or part of JSON document.
Iterators by themselves cannot produce dicts so I wonder.
Also IMO there is no specific limitation with iterators, it is more that JSONStreamingRenderer
might only support lists for encoding, didn't check their code yet. If you need help, we can have a look at the JSONStreamingRenderer
library code together.
For example in my (very slow) branch I have the stream to spit key/value pairs. main...streaming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I know see what you meant, my bad, I was looking at the wrong thing. Yes, none of the renderers provided in this PR wraps the results in this "metadata" dict that you have seen returned by the standard renderers. Instead they just return the results as a list. I must admit I didn't pay much attention to conformance because Denis was mostly interested in timing the request-response round-trip. Or so I understood. If there is more for me to do regarding conformance, happy to do so, but I am not sure what the fate of the prototype is (I am not sure if we are still using it as a test bench or if we want to make everything clean and become a reference implementation).
But if we want to make things clean and become a reference implementation, we probably need to talk about conformance, not just about "toy" renderers like these, but about the entire JSON API.
I didn't use GDAL directly. I found the complexity of the installation procedure, the lack of documentation and the use of string literals in several places daunting. I didn't know
fiona
existed and I think it's very nice!Please note that I have not tested this in QGIS.