Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM lifecycle management #18

Open
gperdrizet opened this issue Mar 13, 2024 · 0 comments
Open

LLM lifecycle management #18

gperdrizet opened this issue Mar 13, 2024 · 0 comments
Assignees

Comments

@gperdrizet
Copy link
Owner

LLM instances need lifecycle management. Two things we are looking for here:

  1. If a user has not interacted with a LLM instance in some amount of time, we kill it to reclaim GPU resources.
  2. If a user tries to spin up a new LLM instance, first we check if we have room, if we don't, either fall back to CPU or kick an older model off of the GPU.

Now that, I'm writing this, maybe we should demote older LLM instances to CPU before/instead of garbage collecting them. That way, when/if someone starts talking to them, we don't need to go through a cold start, but we also aren't hogging GPU.

Anyway, this deserves some attention - as it stands now whenever a user wants to talk to a new type of model, we just keep jamming them onto the GPUs until we inevitably OOM. Not good.

@gperdrizet gperdrizet self-assigned this Mar 13, 2024
@gperdrizet gperdrizet moved this from Todo to In Progress in Backdrop Build Launch Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: In Progress
Development

No branches or pull requests

1 participant