Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flatgeobuf + JSON + streaming renderers #89

Closed
wants to merge 27 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
6740a58
fixing incorrect merge choices (2)
why-not-try-calmer Sep 29, 2023
b178171
2 typos
why-not-try-calmer Sep 29, 2023
26c278a
Merge branch 'main' into flatgeobuf-renderer
why-not-try-calmer Sep 29, 2023
7e04d88
format, renamed var
why-not-try-calmer Sep 29, 2023
b792cfa
adjusting decorator
why-not-try-calmer Sep 29, 2023
06d1d67
removed unnecessary
why-not-try-calmer Sep 29, 2023
24fe4ae
requirements
why-not-try-calmer Sep 29, 2023
f227a01
make list() invoke renderers implicitly; format
why-not-try-calmer Sep 29, 2023
55218dc
typo
why-not-try-calmer Sep 29, 2023
b7a9728
rescued tests; reformatted test.yaml
why-not-try-calmer Sep 29, 2023
0d69c98
Merge branch 'main' into restoring-unit-tests
why-not-try-calmer Sep 29, 2023
018aa5e
format
why-not-try-calmer Sep 29, 2023
16eec2a
Merge branch 'restoring-unit-tests' into flatgeobuf-renderer
why-not-try-calmer Sep 29, 2023
12c61c9
format
why-not-try-calmer Sep 29, 2023
7ef1401
renderer_classes
why-not-try-calmer Oct 6, 2023
a79ba8d
requirements
why-not-try-calmer Oct 6, 2023
553f2a1
cleaned up; output
why-not-try-calmer Oct 6, 2023
06078f4
format
why-not-try-calmer Oct 6, 2023
e805bd4
fixed renderers
why-not-try-calmer Oct 6, 2023
b9f9c50
fixed last 2 renderers
why-not-try-calmer Oct 6, 2023
7bc1818
test results
why-not-try-calmer Oct 6, 2023
b484e55
format
why-not-try-calmer Oct 6, 2023
c608223
removed unused; clarified docstring
why-not-try-calmer Oct 6, 2023
eedcb7d
fixed incorrect renderers! my bad, I had added things that turned out…
why-not-try-calmer Oct 6, 2023
632e18e
correct usage of get_context
why-not-try-calmer Oct 7, 2023
2e644f7
explcitihandling of serialization into rendering for renderers à la c…
why-not-try-calmer Oct 7, 2023
875fe8b
filterset
why-not-try-calmer Oct 7, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 38 additions & 39 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,51 +17,50 @@ jobs:
OGCAPIF_HOST: localhost

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v3

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2

- name: Login to DockerHub
if: github.repository == 'opengisch/django-ogcapif'
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Login to DockerHub
if: github.repository == 'opengisch/django-ogcapif'
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}

- name: Build Django image
uses: docker/build-push-action@v4
with:
context: .
provenance: false
file: docker/django/Dockerfile
push: ${{ github.repository == 'opengisch/django-ogcapif' }}
pull: true
cache-from: type=registry,ref=opengisch/django-oapif:latest
cache-to: type=registry,ref=opengisch/django-oapif:latest,mode=max
tags: opengisch/django-oapif:latest
- name: Build Django image
uses: docker/build-push-action@v4
with:
context: .
provenance: false
file: docker/django/Dockerfile
push: ${{ github.repository == 'opengisch/django-ogcapif' }}
pull: true
cache-from: type=registry,ref=opengisch/django-oapif:latest
cache-to: type=registry,ref=opengisch/django-oapif:latest,mode=max
tags: opengisch/django-oapif:latest

- name: Setup Compose
run: |
# copy default conf
cp .env.example .env
- name: Setup Compose
run: |
# copy default conf
cp .env.example .env

# start the stack
docker compose up --build -d
# start the stack
docker compose up --build -d

# deploy static files and migrate database
docker compose exec django python manage.py collectstatic --no-input
docker compose exec django python manage.py migrate --no-input
docker compose exec django python manage.py populate_users
docker compose exec django python manage.py populate_data
# deploy static files and migrate database
docker compose exec django python manage.py collectstatic --no-input
docker compose exec django python manage.py migrate --no-input
docker compose exec django python manage.py populate_users
docker compose exec django python manage.py populate_data

# - name: Run unit tests
# run: |
# docker compose exec django python manage.py test signalo.value_lists
- name: Run unit tests
run: docker compose exec django python manage.py test tests

- name: Run integration tests
run: docker compose run integration_tests
- name: Run integration tests
run: docker compose exec integration_tests

- name: Failure logs
if: failure()
run: docker-compose logs
- name: Failure logs
if: failure()
run: docker-compose logs
13 changes: 9 additions & 4 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
django>=4
django-computedfields
psycopg2-binary
transifex-client
djangorestframework-gis
git+https://github.com/3nids/django-rest-framework-gis@93d9718d123c2158f4d1c5f6cf461ba8f6cf89e7
djangorestframework
git+https://github.com/3nids/django-rest-framework-gis@patch-1
fiona
json-stream-generator
psycopg2-binary
pyproj
pyyaml
transifex-client
uritemplate
pyproj
orjson
ujson
59 changes: 57 additions & 2 deletions src/django_oapif/decorators.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,25 @@

from django.db import models
from django.db.models.functions import Cast
from rest_framework import serializers, viewsets
from django.http.response import StreamingHttpResponse
from rest_framework import renderers, response, serializers, viewsets
from rest_framework.serializers import ModelSerializer
from rest_framework_gis.serializers import GeoFeatureModelSerializer

from django_oapif.metadata import OAPIFMetadata
from django_oapif.mixins import OAPIFDescribeModelViewSetMixin
from django_oapif.renderers import (
FGBRenderer,
JSONorjson,
JSONStreamingRenderer,
JSONujson,
)
from django_oapif.urls import oapif_router

from .filters import BboxFilterBackend
from .functions import AsGeoJSON

USE_PG_GEOJSON = False
USE_PG_GEOJSON = True


def register_oapif_viewset(
Expand Down Expand Up @@ -87,6 +93,7 @@ class Meta:
class Viewset(OAPIFDescribeModelViewSetMixin, viewsets.ModelViewSet):
queryset = Model.objects.all()
serializer_class = AutoSerializer
renderer_classes = [renderers.JSONRenderer, FGBRenderer]

# TODO: these should probably be moved to the mixin
oapif_title = Model._meta.verbose_name
Expand All @@ -103,13 +110,61 @@ class Viewset(OAPIFDescribeModelViewSetMixin, viewsets.ModelViewSet):
metadata_class = OAPIFMetadata

def finalize_response(self, request, response, *args, **kwargs):
"""Ensure OPTIONS requests get a correct response."""
response = super().finalize_response(request, response, *args, **kwargs)
if request.method == "OPTIONS":
allowed_actions = self.metadata_class().determine_actions(request, self)
allowed_actions = ", ".join(allowed_actions.keys())
response.headers["Allow"] = allowed_actions
return response

def list(self, request, *args, **kwargs) -> response.Response | StreamingHttpResponse:
"""
Stream collection items as FGB chunks if 'format=fgb' is passed.
Stream them as JSON chunks if 'format=json' and streaming=true' are passed.
Otherwise render them as a single JSON chunk, using a variety of renderers depending on the request.
"""
streaming = request.query_params.get("streaming", "").casefold() == "true"
format = request.query_params.get("format")
encoder = request.query_params.get("json_encoder")

if format == "fgb":
renderer = FGBRenderer()
elif format == "json":
if streaming:
renderer = JSONStreamingRenderer()
elif encoder == "orjson":
renderer = JSONorjson()
elif encoder == "ujson":
renderer = JSONujson()
else:
renderer = renderers.JSONRenderer()
else:
renderer = renderers.JSONRenderer()

if streaming:
renderer_context = self.get_renderer_context()
serializer = self.get_serializer_class()
qs = self.filter_queryset(self.get_queryset())
if paginated_qs := self.paginate_queryset(qs):
serialized = serializer(paginated_qs, many=True).data
else:
serialized = serializer(qs, many=True).data
rendered = renderer.render(serialized, renderer.media_type, renderer_context)
return StreamingHttpResponse(rendered)
else:
request.accepted_renderer = renderer
request.accepted_media_type = renderer.media_type
return super().list(request, *args, **kwargs)

def get_renderer_context(self):
context = super().get_renderer_context()

if hasattr(Model, "get_schema"):
context.update({"schema": Model.get_schema()})

return context

def get_queryset(self):
qs = super().get_queryset()

Expand Down
62 changes: 62 additions & 0 deletions src/django_oapif/renderers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
import io
from typing import Any, Generator, OrderedDict

import fiona
import orjson
import ujson
from django.conf import settings
from fiona.crs import CRS
from json_stream_generator import json_generator
from rest_framework import renderers


class FGBRenderer(renderers.BaseRenderer):
format = "fgb"
media_type = "application/event-stream"

def render(self, data: OrderedDict, accepted_media_type=None, renderer_context=None) -> io.BytesIO:
"""Renders data as a flatgeobuf binary stream"""
assert renderer_context
features_data = data["features"] if "features" in data else data["results"]["features"]
features = (fiona.Feature.from_dict(obj) for obj in features_data)
buffer_wrapper = io.BytesIO()
with fiona.open(
buffer_wrapper,
mode="w",
driver="FlatGeobuf",
schema=renderer_context["schema"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the model, as requested

crs=CRS.from_epsg(settings.GEOMETRY_SRID),
) as fh:
for feature in features:
fh.write(feature)

buffer_wrapper.seek(0)
return buffer_wrapper


class JSONStreamingRenderer(renderers.BaseRenderer):
format = "json"
media_type = "application/x-ndjson"

def render(self, data: OrderedDict, accepted_media_type=None, renderer_context=None) -> Generator[Any, None, None]:
"""Renders JSON encoded stream."""
features_data = data["features"] if "features" in data else data["results"]["features"]
return json_generator(feature for feature in features_data)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterator object so that the response's rendered content can be streamed from the renderer. Using the default JSON renderer and / or the naked StreamingHttps response (naked as in: "without passing this constructor an iterator object") doesn't provide actual streaming.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case this PR remains open (or the other branch).

What is the type of data["results"]["features"]? If it is a list, wouldn't it be great to try to preserve it as a db cursor until this stage so we really experience the benefit of the streaming.

Otherwise if it is a list, we iterate over multiple times.

Also, I am afraid the datastructure returned by this response is just an array of features, while the default json renderer returns an object (as it is in the specs?).

Copy link
Contributor Author

@why-not-try-calmer why-not-try-calmer Oct 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data argument of this function is the serialized data. So whatever the type of data["results"]["features"] is, it is a serialized type, constructed after closing/consuming the cursor. I think there's an advantage in this; if you let the user consuming the response get a hold of the database cursor, you are making the cursor a hostage of the connection. This requires some extra complexity (i.e. managing different sorts of exceptions differently depending on their nature) which I didn't think was worth it for the sake of this prototype.

What is "this response"? json_generator? It can be seen here https://github.com/wlatanowicz/json-stream-generator/blob/0a3aa95acca188f2587da3b6f5c4915d49b0ee39/json_stream_generator/__init__.py#L19. If you mean the Django / DRF response objects, I've just added a bit that makes the construction of these objects more explicit (and I hope more correct with respect to the goals).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data argument of this function is the serialized data. So whatever the type of data["results"]["features"] is, it is a serialized type, constructed after closing/consuming the cursor. I think there's an advantage in this; if you let the user consuming the response get a hold of the database cursor, you are making the cursor a hostage of the connection. This requires some extra complexity (i.e. managing different sorts of exceptions differently depending on their nature) which I didn't think was worth it for the sake of this prototype.

Agree this brings additional complexity we might not need for now. Just worth mentioning (and documenting?) one of the main benefits of extremely big responses is not to fit everything in the memory.

What is "this response"? json_generator? It can be seen here wlatanowicz/json-stream-generator@0a3aa95/json_stream_generator/init.py#L19. If you mean the Django / DRF response objects, I've just added a bit that makes the construction of these objects more explicit (and I hope more correct with respect to the goals).

Nope, I meant with any serializer in serializers.py it seems they are returning only a list of features. Isn't it expected to return a dict having one of the values a list of features?

Copy link
Contributor Author

@why-not-try-calmer why-not-try-calmer Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the unit tests succeed, since our custom renderers require a dict as first arg, and since this arg = the value of serializer.data, where serializer is any instance of a Serializer, I believe it is a dict (probably an OrderedDict, seeing from rest_framework's code).

Apart from the issue of allocating the entire collection in memory, is there something that you worry about? Above you mentioned that the collection would be traversed more than once, but I am not sure. I have not audited the call tree, but I think there's nothing out of the ordinary in the way this is done here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# IMO good
(django-ogcapif) *[72b7284][~/work/opengisch/django-ogcapif]$ curl http://0.0.0.0:7180/oapif/collections/tests.point_2056_10fields/items | jq | head 
{
  "links": [
    {
      "type": "application/geo+json",
      "rel": "next",
      "title": "items (next)",
      "href": "http://0.0.0.0:7180/oapif/collections/tests.point_2056_10fields/items?limit=1000&offset=1000"
    },
    {
      "type": "application/geo+json",

# IMO good
(django-ogcapif) *[72b7284][~/work/opengisch/django-ogcapif]$ curl http://0.0.0.0:7180/oapif/collections/tests.point_2056_10fields/items\?format\=json\&encoder\=ujson | jq | head
{
  "links": [
    {
      "type": "application/geo+json",
      "rel": "next",
      "title": "items (next)",
      "href": "http://0.0.0.0:7180/oapif/collections/tests.point_2056_10fields/items?encoder=ujson&format=json&limit=1000&offset=1000"
    },
    {
      "type": "application/geo+json",

# IMO not good
(django-ogcapif) *[72b7284][~/work/opengisch/django-ogcapif]$ curl http://0.0.0.0:7180/oapif/collections/tests.point_2056_10fields/items\?format\=json\&streaming\=true | jq | head 
[
  {
    "id": "30ea6bb0-467f-4f26-9f11-c096fbd70db5",
    "type": "Feature",
    "geometry": {
      "bbox": [
        2508500,
        1152000,
        2508500,
        1152000

I believe the third call is a different JSON structure, it is part of the specs, or I am missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point. How would you configure a StreamingHttpResponse in such a way that it produces a dict? Iterators by themselves cannot produce dicts so I wonder.

Copy link
Collaborator

@suricactus suricactus Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is problematic, there are two things to be done about unit tests:

  1. we need to fix the unit tests so they cover this scenario, as returning an array is not part of the OpenAPI specification AFAIK.
  2. we should not entirely rely on unit tests. Checking the produced endpoints output manually after major low-level change is never a bad idea.

It's a good point. How would you configure a StreamingHttpResponse in such a way that it produces a dict?

I don't think the problem is in StreamingHttpResponse, as it just serves bytes, whenever they are parts of an image, csv rows or part of JSON document.

Iterators by themselves cannot produce dicts so I wonder.

Also IMO there is no specific limitation with iterators, it is more that JSONStreamingRenderer might only support lists for encoding, didn't check their code yet. If you need help, we can have a look at the JSONStreamingRenderer library code together.
For example in my (very slow) branch I have the stream to spit key/value pairs. main...streaming

Copy link
Contributor Author

@why-not-try-calmer why-not-try-calmer Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I know see what you meant, my bad, I was looking at the wrong thing. Yes, none of the renderers provided in this PR wraps the results in this "metadata" dict that you have seen returned by the standard renderers. Instead they just return the results as a list. I must admit I didn't pay much attention to conformance because Denis was mostly interested in timing the request-response round-trip. Or so I understood. If there is more for me to do regarding conformance, happy to do so, but I am not sure what the fate of the prototype is (I am not sure if we are still using it as a test bench or if we want to make everything clean and become a reference implementation).

But if we want to make things clean and become a reference implementation, we probably need to talk about conformance, not just about "toy" renderers like these, but about the entire JSON API.



class JSONorjson(renderers.BaseRenderer):
format = "json"
media_type = "application/json"

def render(self, data: OrderedDict, accepted_media_type=None, renderer_context=None) -> bytes:
features_data = data["features"] if "features" in data else data["results"]["features"]
return orjson.dumps(features_data)


class JSONujson(renderers.BaseRenderer):
format = "json"
media_type = "application/json"

def render(self, data: OrderedDict, accepted_media_type=None, renderer_context=None) -> str:
features_data = data["features"] if "features" in data else data["results"]["features"]
return ujson.dumps(features_data)
4 changes: 4 additions & 0 deletions src/signalo/core/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ class Pole(ComputedFieldsModel):
def _serialized(self):
return f'{{"id": "{str(self.id)}", "type": "Feature", "geometry": {self.geom.geojson}, "properties": {{"name": "{self.name}"}}}}'

@staticmethod
def get_schema() -> dict:
return {"geometry": "Point", "properties": {"name": "str", "_serialized": "str"}}


@register_oapif_viewset()
class Azimuth(ComputedFieldsModel):
Expand Down
10 changes: 7 additions & 3 deletions src/signalo/roads/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@
from signalo.settings import GEOMETRY_SRID


class Limit1000(OapifPagination):
default_limit = 1000
class Limit100000(OapifPagination):
default_limit = 100000


@register_oapif_viewset(custom_viewset_attrs={"pagination_class": Limit1000})
@register_oapif_viewset(custom_viewset_attrs={"pagination_class": Limit100000})
class Road(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
geom = models.MultiLineStringField(srid=GEOMETRY_SRID, verbose_name=_("Geometry"))
Expand All @@ -22,3 +22,7 @@ def get_geom(self, geometry: str = "wkb") -> Union[memoryview, str]:
if geometry == "wkb":
return self.geom.wkb
return self.geom.wkt

@staticmethod
def get_schema() -> dict:
return {"geometry": "MultiLineString", "properties": {}}
7 changes: 7 additions & 0 deletions src/tests/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,10 @@

INSTALLED_APPS = [
"tests",
"signalo.core",
"signalo.value_lists",
"signalo.edge_cases",
"signalo.roads",
"django.contrib.admin",
"django.contrib.auth",
"django.contrib.contenttypes",
Expand Down Expand Up @@ -171,6 +175,9 @@
OAPIF_TITLE = "DJANGO_OAPIF"
OAPIF_DESCRIPTION = "Django OAPIF tests"

# Geometry's SRID. This can only be changed prior to initializing the database.
GEOMETRY_SRID = int(os.environ.get("GEOMETRY_SRID", "2056"))

INTERNAL_IPS = [
"127.0.0.1", # so that Django toolbar is displayed to localhost
]
Loading