Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow parallel inference #724

Open
1 task done
petrm opened this issue Dec 9, 2024 · 0 comments
Open
1 task done

Allow parallel inference #724

petrm opened this issue Dec 9, 2024 · 0 comments
Labels
feature request New feature or request

Comments

@petrm
Copy link

petrm commented Dec 9, 2024

Describe the feature you'd like

I would like the inference queue to allow parallel execution of jobs.

Describe the benefits this would bring to existing Hoarder users

The use case is availability of multiple load ballanced ollama backends that would speed up processing.

Can the goal of this request already be achieved via other means?

Not easily.

  • faster gpu with more ram
  • multiple gpus on the same host
  • something else than ollama that can run jobs distributed on multiple machines (like https://github.com/exo-explore/exo maybe?)

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover my fundamental request

Additional context

No response

@kamtschatka kamtschatka added the feature request New feature or request label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants