Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[7275] adds cache for api endpoints #5256

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion .github/workflows/django.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,15 @@ jobs:
--health-interval 10s
--health-timeout 5s
--health-retries 5
redis:
image: redis
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 6379:6379
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
Expand Down Expand Up @@ -60,7 +69,7 @@ jobs:
${{ runner.os }}-build-${{ env.cache-name }}-
${{ runner.os }}-build-
${{ runner.os }}-
- name: check a4 hashes equal
- name: Check a4 hashes equal
run: |
./scripts/a4-check.sh
- name: Install Dependencies
Expand Down
4 changes: 4 additions & 0 deletions changelog/7275_2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
### Changed

- enables caching for api endpoints `api/{plans,extprojects,containers,projects}/`
- caches are expired by signals and by timeouts, for details, see `docs/caching.md`
57 changes: 57 additions & 0 deletions docs/caching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@


## Background

We have noticed that the page load of `mein.berlin.de/projekte/` is pretty slow with about 6s for 550 projects. Three API calls are particularly slow:

- https://mein.berlin.de/api/projects/?status=pastParticipation 2.811s
- https://mein.berlin.de/api/plans/ 3.613s
- https://mein.berlin.de/api/extprojects/ 5.041s

These urls correspond to the following api views:

- `projects/api.py::ProjectListViewSet`
- `plans/api.py::PlansListViewSet`
- `extprojects/api.py::ExternalProjectListViewSet`

Since we were not able to improve the serializers for these views (see `docs/performance_of_plans_serializer.md`) we decided to start caching the endpoints with a redis backend.

## Developer Notes

The cache target is the `list` method of the following views:

- `ExternalProjectViewSet`
- `PlansViewSet`
- `ProjectContainerViewSet`
- `ProjectViewSet`
- `PrivateProjectViewSet`

To avoid repeating code for adding cache keys we have added a new file `apps/contrib/caching.py` with functions for caching the list method of api views (see `cache_add_or_serialize`) and for caching query sets in general (see `cache_add_or_query`).

Cache keys expire after a timeout (default value 1h) or if a context specific signal is received (e.g. cache keys for projects are deleted if the signal for a saved project is detected).

The cache keys for projects are dynamic. That is, the cache value depends on the request query parameter `status` (e.g. `status=activeParticipation`). The status value is used as a suffix in the key generation. Cache keys are name-spaced so that 1) new status values can be added without breaking the cache, 2) we only need to know the name space to delete all keys related to projects, not each individual key.

The name space for each API endpoint is hard-coded and prefixed by a global caching prefix (see `contrib/caching.py::CACHE_KEY_PREFIX`). Typical cache keys look like this:
- `api_cache_projects`
- `api_cache_projects_activeParticipation`
- `api_cache_projects_pastParticipation`
- `api_cache_projects_futureParticipation`
- `api_cache_privateprojects`
- `api_cache_externalprojects`
- `api_cache_projectcontainers`
- `api_cache_plans`

If you want to delete all keys of a namespace use `caching.delete(namespace=..)`.

There is one rule for naming namespace: make sure that they are not continuations of each other. So `projects` and `privateprojects` is fine, but `projects` and `projects_private` would be problematic because `caching.delete(namespace="projects")` would wipe both name spaces.

In production, we use redis as cache backend (`django_redis.cache.RedisCache`, see `settings/production.py::CACHES`). For development the cache backend is disabled (undefined). If you want to enable it locally, then copy the production settings to your `settings/local.py`.

All tests that relate to caching are conditional. If the redis caching backend is not detected (see `caching.py::REDIS_IS_ENABLED`) then tests are skipped.

Some more notes about the caching module:
- it is intended to be imported as a module instead of importing individual methods (e.g. you call `caching.add_or_query(..)` instead of `add_or_query(..)`)
- `caching.delete(..)` can either be given a namespace or an explicit list of keys to be deleted and returns all keys that were deleted
- careful: if you call `cache.clear()` (of Django's `cache` object) you will delete all keys currently in redis, including any keys related to celery, for example
- whenever a management command is called (more precisely when the caching module is imported, e.g. also when running pytest) you will now see the cache startup message
3 changes: 3 additions & 0 deletions meinberlin/apps/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from logging import getLogger

logger = getLogger(__name__)
96 changes: 96 additions & 0 deletions meinberlin/apps/contrib/caching.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
from typing import Any
from typing import Dict
from typing import List
from typing import Optional

import redis
from django.conf import settings
from django.core.cache import cache
from django.db.models import QuerySet
from rest_framework.response import Response
from rest_framework.viewsets import GenericViewSet

from meinberlin.apps import logger

ONE_HOUR = 3600
CACHE_SETTINGS = settings.CACHES.get("default", {})
DEFAULT_TIMEOUT = CACHE_SETTINGS.get("DEFAULT_TIMEOUT", ONE_HOUR)
CACHE_KEY_PREFIX = CACHE_SETTINGS.get("CACHE_KEY_PREFIX", "api_cache")
CACHE_LOCATION = CACHE_SETTINGS.get("LOCATION", "")
REDIS_IS_ENABLED = "redis" in CACHE_LOCATION
REDIS_CLIENT = redis.from_url(url=CACHE_LOCATION) if REDIS_IS_ENABLED else None

logger.info(
f"cache startup: {REDIS_IS_ENABLED=}, {CACHE_LOCATION=}, {CACHE_KEY_PREFIX=}, {DEFAULT_TIMEOUT=}"
)


def create_key(namespace: str, suffix: str = "") -> str:
terms = [x for x in [CACHE_KEY_PREFIX, namespace, suffix] if x]
return "_".join(terms)


def delete(
namespace: str = "",
context: Optional[dict] = None,
keys: Optional[List[str]] = None,
) -> List[str]:
keys = keys or []

if REDIS_IS_ENABLED and namespace:
pattern = f"*{create_key(namespace=namespace)}*"
keys = REDIS_CLIENT.keys(pattern=pattern)
if keys:
REDIS_CLIENT.delete(*keys)

logger.info(f"cache delete: {namespace=}, {pattern=}, {len(keys)=}, {context=}")

elif keys:
for key in keys:
cache.delete(key)

logger.info(f"cache delete: {len(keys)=}, {context=}")

else:
logger.warning(f"cache delete failed: {namespace=}, {keys=}, {context=}")

return keys


def add_or_query(
namespace: str,
query_set: QuerySet,
filter_kwargs: Optional[Dict[str, Any]] = None,
suffix: str = "",
timeout: int = DEFAULT_TIMEOUT,
) -> QuerySet:
filter_kwargs = filter_kwargs or {}
key = create_key(namespace=namespace, suffix=suffix)
filtered_query = cache.get(key)

if filtered_query is None:
logger.info(f"cache missed: {key=}")
filtered_query = query_set.filter(**filter_kwargs)
cache.set(key=key, value=filtered_query, timeout=timeout)

return filtered_query


def add_or_serialize(
namespace: str,
view_set: GenericViewSet,
context: Optional[dict] = None,
suffix: str = None,
timeout: int = DEFAULT_TIMEOUT,
) -> Response:
key = create_key(namespace=namespace, suffix=suffix)
data = cache.get(key)

if not data:
logger.info(f"cache missed: {key=}, {context=}")
queryset = view_set.filter_queryset(queryset=view_set.get_queryset())
serializer = view_set.get_serializer(queryset, many=True)
data = serializer.data
cache.set(key=key, value=data, timeout=timeout)

return Response(data)
9 changes: 8 additions & 1 deletion meinberlin/apps/extprojects/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@
from rest_framework import viewsets

from adhocracy4.projects.enums import Access
from meinberlin.apps.contrib import caching
from meinberlin.apps.extprojects.models import ExternalProject
from meinberlin.apps.extprojects.serializers import ExternalProjectSerializer


class ExternalProjectListViewSet(viewsets.ReadOnlyModelViewSet):
class ExternalProjectViewSet(viewsets.ReadOnlyModelViewSet):
def get_queryset(self):
return ExternalProject.objects.filter(
project_type="meinberlin_extprojects.ExternalProject",
Expand All @@ -18,3 +19,9 @@ def get_queryset(self):
def get_serializer(self, *args, **kwargs):
now = timezone.now()
return ExternalProjectSerializer(now=now, *args, **kwargs)

def list(self, request, *args, **kwargs):
hklarner marked this conversation as resolved.
Show resolved Hide resolved
context = {"path": request.path}
return caching.add_or_serialize(
namespace="externalprojects", view_set=self, context=context
)
3 changes: 3 additions & 0 deletions meinberlin/apps/extprojects/apps.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@
class Config(AppConfig):
name = "meinberlin.apps.extprojects"
label = "meinberlin_extprojects"

def ready(self):
import meinberlin.apps.extprojects.signals # noqa:F401
10 changes: 10 additions & 0 deletions meinberlin/apps/extprojects/signals.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
from django.db.models.signals import post_save
from django.dispatch import receiver

from meinberlin.apps.contrib import caching
from meinberlin.apps.extprojects.models import ExternalProject


@receiver(post_save, sender=ExternalProject)
def reset_cache(sender, instance, update_fields, **kwargs):
caching.delete(namespace="externalprojects")
9 changes: 8 additions & 1 deletion meinberlin/apps/plans/api.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,18 @@
from rest_framework import viewsets

from meinberlin.apps.contrib import caching
from meinberlin.apps.plans.models import Plan
from meinberlin.apps.plans.serializers import PlanSerializer


class PlansListViewSet(viewsets.ReadOnlyModelViewSet):
class PlansViewSet(viewsets.ReadOnlyModelViewSet):
serializer_class = PlanSerializer

def get_queryset(self):
return Plan.objects.filter(is_draft=False).prefetch_related("projects")

def list(self, request, *args, **kwargs):
hklarner marked this conversation as resolved.
Show resolved Hide resolved
context = {"path": request.path}
return caching.add_or_serialize(
namespace="plans", view_set=self, context=context
)
3 changes: 3 additions & 0 deletions meinberlin/apps/plans/apps.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@
class Config(AppConfig):
name = "meinberlin.apps.plans"
label = "meinberlin_plans"

def ready(self):
import meinberlin.apps.plans.signals # noqa:F401
12 changes: 12 additions & 0 deletions meinberlin/apps/plans/signals.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from django.db.models.signals import post_save
from django.dispatch import receiver

from meinberlin.apps.contrib import caching

from .models import Plan


@receiver(post_save, sender=Plan)
def reset_cache(sender, instance, update_fields, **kwargs):
context = {"trigger": "signal", "plan_id": instance.id}
caching.delete(namespace="plans", context=context)
12 changes: 9 additions & 3 deletions meinberlin/apps/projectcontainers/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,23 @@
from rest_framework import viewsets

from adhocracy4.projects.enums import Access
from meinberlin.apps.contrib import caching
from meinberlin.apps.projectcontainers.models import ProjectContainer
from meinberlin.apps.projectcontainers.serializers import ProjectContainerSerializer


class ProjectContainerListViewSet(viewsets.ReadOnlyModelViewSet):
class ProjectContainerViewSet(viewsets.ReadOnlyModelViewSet):
def get_queryset(self):
containers = ProjectContainer.objects.filter(
return ProjectContainer.objects.filter(
is_draft=False, access=Access.PUBLIC, is_archived=False
)
return containers

def get_serializer(self, *args, **kwargs):
now = timezone.now()
return ProjectContainerSerializer(now=now, *args, **kwargs)

def list(self, request, *args, **kwargs):
context = {"path": request.path}
return caching.add_or_serialize(
namespace="projectcontainers", view_set=self, context=context
)
3 changes: 3 additions & 0 deletions meinberlin/apps/projectcontainers/apps.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@
class Config(AppConfig):
name = "meinberlin.apps.projectcontainers"
label = "meinberlin_projectcontainers"

def ready(self):
import meinberlin.apps.projectcontainers.signals # noqa:F401
12 changes: 12 additions & 0 deletions meinberlin/apps/projectcontainers/signals.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from django.db.models.signals import post_save
from django.dispatch import receiver

from meinberlin.apps.contrib import caching

from .models import ProjectContainer


@receiver(post_save, sender=ProjectContainer)
def reset_cache(sender, instance, update_fields, **kwargs):
context = {"trigger": "signal", "project_container_id": instance.id}
caching.delete(namespace="projectcontainers", context=context)
39 changes: 26 additions & 13 deletions meinberlin/apps/projects/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,12 @@

from adhocracy4.projects.enums import Access
from adhocracy4.projects.models import Project
from meinberlin.apps.contrib import caching
from meinberlin.apps.projects import serializers as project_serializers
from meinberlin.apps.projects.filters import StatusFilter


class ProjectListViewSet(viewsets.ReadOnlyModelViewSet):
class ProjectViewSet(viewsets.ReadOnlyModelViewSet):
filter_backends = (DjangoFilterBackend, StatusFilter)

def __init__(self, *args, **kwargs):
Expand Down Expand Up @@ -39,43 +40,55 @@ def get_queryset(self):
)
return projects

def list(self, request, *args, **kwargs):
return caching.add_or_serialize(
namespace="projects",
suffix=self.request.GET.get("status"),
context={"path": request.path},
view_set=self,
)

def get_serializer(self, *args, **kwargs):
if "status" in self.request.GET:
statustype = self.request.GET["status"]
if statustype == "activeParticipation":
status = self.request.GET["status"]
if status == "activeParticipation":
return project_serializers.ActiveProjectSerializer(
now=self.now, *args, **kwargs
)
if statustype == "futureParticipation":
if status == "futureParticipation":
return project_serializers.FutureProjectSerializer(
now=self.now, *args, **kwargs
)
if statustype == "pastParticipation":
if status == "pastParticipation":
return project_serializers.PastProjectSerializer(
now=self.now, *args, **kwargs
)
return project_serializers.ProjectSerializer(now=self.now, *args, **kwargs)


class PrivateProjectListViewSet(viewsets.ReadOnlyModelViewSet):
class PrivateProjectViewSet(viewsets.ReadOnlyModelViewSet):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
now = timezone.now()
self.now = now

def get_queryset(self):
private_projects = Project.objects.filter(
is_draft=False, is_archived=False, access=Access.PRIVATE
filter_kwargs = dict(is_draft=False, is_archived=False, access=Access.PRIVATE)
projects = caching.add_or_query(
namespace="privateprojects",
query_set=Project.objects,
filter_kwargs=filter_kwargs,
)
if private_projects:
not_allowed_projects = [

if projects:
not_accessible = [
project.id
for project in private_projects
for project in projects
if not self.request.user.has_perm("a4projects.view_project", project)
]
return private_projects.exclude(id__in=not_allowed_projects)
return projects.exclude(id__in=not_accessible)
else:
return private_projects
return projects

def get_serializer(self, *args, **kwargs):
return project_serializers.ProjectSerializer(now=self.now, *args, **kwargs)
Loading