Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement caching system #73

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Implement caching system #73

wants to merge 16 commits into from

Conversation

albireox
Copy link
Member

@albireox albireox commented Dec 9, 2024

This implements caching for Valis queries.

Uses fastapi-cache to optionally cache Valis routes. Two backends are supported, Redis and memcached, which can be toggled using a new setting cache_backend (defaults to redis but memcached does not require ).

I added cache decorators to all the suggested routes in #69 plus some of the /query/list ones except the main query. The library only supports caching GET requests.If this is a dealbreaker we can consider something different for those routes.

On Redis, the key with the cached values is of the form fastapi-cache:valis-cache:get:/query/list/cartons:[] which includes the route and the query params used.

Closes #69

@albireox albireox requested a review from havok2063 as a code owner December 9, 2024 19:41
@albireox
Copy link
Member Author

albireox commented Dec 9, 2024

One thing that it doesn't seem to be implemented in fastapi-cache is the option to temporarily (or for easier development) disable caching. The implementation of a backend in fastapi-cache is relatively simple

class Backend(abc.ABC):
    @abc.abstractmethod
    async def get_with_ttl(self, key: str) -> Tuple[int, Optional[bytes]]:
        raise NotImplementedError

    @abc.abstractmethod
    async def get(self, key: str) -> Optional[bytes]:
        raise NotImplementedError

    @abc.abstractmethod
    async def set(self, key: str, value: bytes, expire: Optional[int] = None) -> None:
        raise NotImplementedError

    @abc.abstractmethod
    async def clear(self, namespace: Optional[str] = None, key: Optional[str] = None) -> int:
        raise NotImplementedError

It's probably not super hard to create a null backend that doesn't do anything and always executes the route.

@albireox
Copy link
Member Author

I think this is now ready for review. I've made a few changes:

  • We are now using an slightly custom version of the fastapi-cache cache decorator. This one supports POST routes and hashes the contents of the request body. With this we can decorate the main query route. A caveat is that cached routes cannot return an iterator or they will return an empty list in the first request (when they don't hit the cache). I think this is difficult to avoid the way that the decorator works. But I'm also unsure that the generator does much in these cases since the route does need to return the full data payload in one go (unless something like streaming is used).
  • I increased the default cache time to 6 months, not sure if we want more or less.
  • I replaced the memcached backend (I realised this does require installing software) with an in-memory backend that does not require anything external.
  • I added a null backend that does no caching.

Copy link
Contributor

@havok2063 havok2063 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I'll pull and test it out first before approving. Is there a way we can explicitly trigger the cache to clear, and wipe the redis or in-memory cache?

python/valis/cache.py Outdated Show resolved Hide resolved
python/valis/cache.py Outdated Show resolved Hide resolved
python/valis/routes/query.py Outdated Show resolved Hide resolved
python/valis/routes/query.py Outdated Show resolved Hide resolved
python/valis/routes/target.py Outdated Show resolved Hide resolved
Comment on lines 187 to 188
async def get_spectrum(self, sdss_id: Annotated[int, Path(title="The sdss_id of the target to get", example=23326)],
product: Annotated[str, Query(description='The file species or data product name', example='specLite')],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return response in target/spectrum route calls get_a_spectrum, which was a generator. Can you check if this route works as intended with the cache? or see if we need to adjust the underlying function?

Copy link
Member Author

@albireox albireox Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work fine for me, but the example request

curl -X 'GET' \
        'http://127.0.0.1:8000/target/spectra/23326?product=specLite&ext=BOSS%2FAPO&release=DR18' \
        -H 'accept: application/json'

returns an empty list and I'm not sure how to find something that will return a full spectrum. Can you check this one?

BTW, this route fails if you just poetry install valis with error

  ...
  File "/Users/gallegoj/Documents/Code/sdss5/valis/python/valis/utils/versions.py", line 97, in get_tags
    return get_latest_tag_info() if release == 'WORK' else get_tag_info(release)
  File "/Users/gallegoj/Documents/Code/sdss5/valis/python/valis/utils/versions.py", line 33, in get_tag_info
    raise RuntimeError('No tag models found.')
RuntimeError: No tag models found.

It seems one needs to manually clone and pip install the datamodel for this to work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For local use, you need the file locally. Otherwise it returns nothing.

And yeah, some routes need the datamodel product installed locally. I didn't want to make the datamodel package a strict dependency of valis. Most of valis only needs the datamodel for its basic python access, but some of the info routes need the full product list. It may be worth either splitting the datamodel into two products, or producing a lightweight pip installable package. But for now yeah you need to git clone and manually install.

Copy link
Contributor

@havok2063 havok2063 Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both the spectrum and pipelines routes, I get the following error, with a longer traceback that reaches into the cache encoder. I tried both a null and in-memory backend and get the same error.

spectrum route http://localhost:8000/target/spectra/23326?product=specLite&ext=BOSS%2FAPO&release=IPL3

INFO:     127.0.0.1:65300 - "GET /target/spectra/23326?product=specLite&ext=BOSS%2FAPO&release=IPL3 HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/fastapi/encoders.py", line 322, in jsonable_encoder
    data = dict(obj)
TypeError: cannot convert dictionary update sequence element #0 to a sequence

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/fastapi/encoders.py", line 327, in jsonable_encoder
    data = vars(obj)
TypeError: vars() argument must have __dict__ attribute

The above exception was the direct cause of the following exception:

pipelines route http://localhost:8000/target/pipelines/23326?pipe=all&release=IPL3

INFO:     127.0.0.1:49185 - "GET /target/pipelines/23326?pipe=all&release=IPL3 HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/fastapi/encoders.py", line 322, in jsonable_encoder
    data = dict(obj)
ValueError: dictionary update sequence element #0 has length 1; 2 is required

During handling of the above exception, another exception occurred:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In get_spectrum(), line 191, can you wrap the call to get_a_spectrum() in a list and see if that works?

python/valis/cache.py Show resolved Hide resolved
@albireox
Copy link
Member Author

To allow clearing the cache I've added a function valis.cache.clear_redis_cache(). If called without arguments it will clear all the keys under fastapi-cache:valis-*.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

implement route caching
2 participants