You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We can have long-running queries that currently block a user from doing anything else. For example, searches on SDSS programs, without any other constraints can result in up to ~15 minutes queries. We should explore implementing a task-manager, e.g. https://taskiq-python.github.io/ as a way to offload long-running jobs.
I think we'd want something that manages the background task or query, and does not block Valis or Zora. Ideally we'd have a way to track the status of the job: e.g. running, completed, error, and potentially have a dashboard of jobs. Another consideration is anonymous versus user tasks and if we need one or the other or both.
FastAPI has some support for Background Tasks . I'm not sure if this is enough for our purposes. Other related packages:
I looked a bit into this for lvmapi and ended up settling for TaskIQ, and I've been liking it quite a lot. It's really easy to set up and plays well with FastAPI. It may have fewer bells than Celery but I actually consider that an advantage.
I looked a bit into the other options (except Dramatiq, I think). Background Tasks sounded promising but I don't think there's a way to track task completion (you just launch something in the background and then lose track of it). It doesn't seem different from just having a pool of asyncio tasks, and in the end one would need to build something to track them. I'm also not sure how well it works with the uvicorngunicorn worker.
Celery has everything one would need but it simply does not support async and all the hacks I tried failed dramatically. Maybe that's a bit less of a blocker for valis since currently all the DB connections are, in practice, sync. But that's a future limitation and it may not work well with FastAPI.
Huey's implementation of asyncio is a bit hacky and I could not make it work properly.
For what you describe I think TaskIQ would work out of the box with the exception of tracking all running tasks. But I think that's easy to implement with a Middleware that would record when a task starts (for example to a table in sdss5db, or to a Redis DB if we use that as the results backend) and when it finishes. Then it's easy to query that to get a complete picture of what tasks are running and where.
Another things that we should consider is how this and asynchronous DB connections are related. If one uses peewee in a worker, and the query takes 15 minutes, that worker won't be able to do anything else for 15 minutes even if all the action at that point is happening in the Postgresql server. As long as the workers are light one can imagine spinning 100 workers (and maybe using the knowledge of how many queries are running to throttle the start of new queries).
But if the database connection is async (with SQLAlchemy 2 or with some hack that would allow peewee to use psycopg 3 async support) then a few workers could handle all the tasks. At that point TaskIQ is not really doing any work other than tracking query completion. I'm actually not sure if TaskIQ allows workers to run concurrent tasks or if a single worker can only run one task at a time.
We can have long-running queries that currently block a user from doing anything else. For example, searches on SDSS programs, without any other constraints can result in up to ~15 minutes queries. We should explore implementing a task-manager, e.g. https://taskiq-python.github.io/ as a way to offload long-running jobs.
I think we'd want something that manages the background task or query, and does not block Valis or Zora. Ideally we'd have a way to track the status of the job: e.g. running, completed, error, and potentially have a dashboard of jobs. Another consideration is anonymous versus user tasks and if we need one or the other or both.
FastAPI has some support for Background Tasks . I'm not sure if this is enough for our purposes. Other related packages:
I do like Taskiq as a simpler, more modern approach, but we may want to consider alternatives.
The text was updated successfully, but these errors were encountered: