Implement caching system #73

albireox · 2024-12-09T19:41:17Z

This implements caching for Valis queries.

Uses fastapi-cache to optionally cache Valis routes. Two backends are supported, Redis and memcached, which can be toggled using a new setting cache_backend (defaults to redis but memcached does not require ).

I added cache decorators to all the suggested routes in #69 plus some of the /query/list ones except the main query. The library only supports caching GET requests.If this is a dealbreaker we can consider something different for those routes.

On Redis, the key with the cached values is of the form fastapi-cache:valis-cache:get:/query/list/cartons:[] which includes the route and the query params used.

Closes #69

albireox · 2024-12-09T20:22:52Z

One thing that it doesn't seem to be implemented in fastapi-cache is the option to temporarily (or for easier development) disable caching. The implementation of a backend in fastapi-cache is relatively simple

class Backend(abc.ABC):
    @abc.abstractmethod
    async def get_with_ttl(self, key: str) -> Tuple[int, Optional[bytes]]:
        raise NotImplementedError

    @abc.abstractmethod
    async def get(self, key: str) -> Optional[bytes]:
        raise NotImplementedError

    @abc.abstractmethod
    async def set(self, key: str, value: bytes, expire: Optional[int] = None) -> None:
        raise NotImplementedError

    @abc.abstractmethod
    async def clear(self, namespace: Optional[str] = None, key: Optional[str] = None) -> int:
        raise NotImplementedError

It's probably not super hard to create a null backend that doesn't do anything and always executes the route.

albireox · 2024-12-10T19:03:50Z

I think this is now ready for review. I've made a few changes:

We are now using an slightly custom version of the fastapi-cache cache decorator. This one supports POST routes and hashes the contents of the request body. With this we can decorate the main query route. A caveat is that cached routes cannot return an iterator or they will return an empty list in the first request (when they don't hit the cache). I think this is difficult to avoid the way that the decorator works. But I'm also unsure that the generator does much in these cases since the route does need to return the full data payload in one go (unless something like streaming is used).
I increased the default cache time to 6 months, not sure if we want more or less.
I replaced the memcached backend (I realised this does require installing software) with an in-memory backend that does not require anything external.
I added a null backend that does no caching.

havok2063

This looks good. I'll pull and test it out first before approving. Is there a way we can explicitly trigger the cache to clear, and wipe the redis or in-memory cache?

python/valis/cache.py

python/valis/routes/query.py

python/valis/routes/target.py

havok2063 · 2024-12-12T17:26:20Z

python/valis/routes/target.py

    async def get_spectrum(self, sdss_id: Annotated[int, Path(title="The sdss_id of the target to get", example=23326)],
                           product: Annotated[str, Query(description='The file species or data product name', example='specLite')],


The return response in target/spectrum route calls get_a_spectrum, which was a generator. Can you check if this route works as intended with the cache? or see if we need to adjust the underlying function?

This seems to work fine for me, but the example request

curl -X 'GET' \ 'http://127.0.0.1:8000/target/spectra/23326?product=specLite&ext=BOSS%2FAPO&release=DR18' \ -H 'accept: application/json'

returns an empty list and I'm not sure how to find something that will return a full spectrum. Can you check this one?

BTW, this route fails if you just poetry install valis with error

... File "/Users/gallegoj/Documents/Code/sdss5/valis/python/valis/utils/versions.py", line 97, in get_tags return get_latest_tag_info() if release == 'WORK' else get_tag_info(release) File "/Users/gallegoj/Documents/Code/sdss5/valis/python/valis/utils/versions.py", line 33, in get_tag_info raise RuntimeError('No tag models found.') RuntimeError: No tag models found.

It seems one needs to manually clone and pip install the datamodel for this to work.

For local use, you need the file locally. Otherwise it returns nothing.

And yeah, some routes need the datamodel product installed locally. I didn't want to make the datamodel package a strict dependency of valis. Most of valis only needs the datamodel for its basic python access, but some of the info routes need the full product list. It may be worth either splitting the datamodel into two products, or producing a lightweight pip installable package. But for now yeah you need to git clone and manually install.

For both the spectrum and pipelines routes, I get the following error, with a longer traceback that reaches into the cache encoder. I tried both a null and in-memory backend and get the same error.

spectrum route http://localhost:8000/target/spectra/23326?product=specLite&ext=BOSS%2FAPO&release=IPL3

INFO: 127.0.0.1:65300 - "GET /target/spectra/23326?product=specLite&ext=BOSS%2FAPO&release=IPL3 HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/fastapi/encoders.py", line 322, in jsonable_encoder data = dict(obj) TypeError: cannot convert dictionary update sequence element #0 to a sequence During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/fastapi/encoders.py", line 327, in jsonable_encoder data = vars(obj) TypeError: vars() argument must have __dict__ attribute The above exception was the direct cause of the following exception:

pipelines route http://localhost:8000/target/pipelines/23326?pipe=all&release=IPL3

INFO: 127.0.0.1:49185 - "GET /target/pipelines/23326?pipe=all&release=IPL3 HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/fastapi/encoders.py", line 322, in jsonable_encoder data = dict(obj) ValueError: dictionary update sequence element #0 has length 1; 2 is required During handling of the above exception, another exception occurred:

In get_spectrum(), line 191, can you wrap the call to get_a_spectrum() in a list and see if that works?

python/valis/cache.py

albireox · 2024-12-13T02:35:02Z

To allow clearing the cache I've added a function valis.cache.clear_redis_cache(). If called without arguments it will clear all the keys under fastapi-cache:valis-*.

Implement caching system

b109d94

albireox requested a review from havok2063 as a code owner December 9, 2024 19:41

albireox added 4 commits December 9, 2024 12:14

memcache -> memcached

cc8a703

Add instructions for local development using memcached

f372c69

Merge branch 'main' into albireox-issue-69

5daa219

Add note on Redis requirement for deployment

4cbd361

albireox added 5 commits December 10, 2024 10:38

Use custom version of fastapi-cache decorator to support POST requests

523dfa0

Change CACHE_TTL to 6 months

f2f42f4

Replace memcached with in-memory

855ba7f

Add a null cache backend

86bbb34

Remove memcache extra from fastapi-cache2

3b26943

havok2063 reviewed Dec 12, 2024

View reviewed changes

albireox added 5 commits December 12, 2024 17:56

Make cache_ttl a setting option

8450f7a

Add comment about in-memory backend

c845173

Improve valis_cache docstring

1d8a6a5

Set custom cache namespaces for target and query routes

09c48b2

Add function to clear namespaces in the Redis cache

1243323

Remove import added by autoimport

3b68b37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement caching system #73

Implement caching system #73

albireox commented Dec 9, 2024 •

edited

Loading

albireox commented Dec 9, 2024

albireox commented Dec 10, 2024

havok2063 left a comment

havok2063 Dec 12, 2024

albireox Dec 13, 2024 •

edited

Loading

havok2063 Dec 13, 2024

havok2063 Dec 13, 2024 •

edited

Loading

albireox Dec 13, 2024

albireox commented Dec 13, 2024

		async def get_spectrum(self, sdss_id: Annotated[int, Path(title="The sdss_id of the target to get", example=23326)],
		product: Annotated[str, Query(description='The file species or data product name', example='specLite')],

Implement caching system #73

Are you sure you want to change the base?

Implement caching system #73

Conversation

albireox commented Dec 9, 2024 • edited Loading

albireox commented Dec 9, 2024

albireox commented Dec 10, 2024

havok2063 left a comment

Choose a reason for hiding this comment

havok2063 Dec 12, 2024

Choose a reason for hiding this comment

albireox Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

havok2063 Dec 13, 2024

Choose a reason for hiding this comment

havok2063 Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

albireox Dec 13, 2024

Choose a reason for hiding this comment

albireox commented Dec 13, 2024

albireox commented Dec 9, 2024 •

edited

Loading

albireox Dec 13, 2024 •

edited

Loading

havok2063 Dec 13, 2024 •

edited

Loading