-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add instructions for running vLLM backend #8
Conversation
Co-authored-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
``` | ||
mkdir -p /opt/tritonserver/backends/vllm | ||
wget -P /opt/tritonserver/backends/vllm https://raw.githubusercontent.com/triton-inference-server/vllm_backend/main/src/model.py | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an action item here, but a random food for thought that could be nice for both users and developers. If we standardize on a certain python-based-backend git repository structure, we can do something like:
git clone https://github.com/triton-inference-server/vllm_backend.git /opt/tritonserver/backends
- Single command
- Developers could iterate on the backend directly in the git repo and just reload triton without copying files/builds around (developer experience)
- More support for multi-file implementations. The
wget
is nice, but won't scale past a single file. Ex: Imaginemodel.py
implementsTritonPythonModel
but importsimplementation.py
that has all the gorey details for certain features.
Just some random Tuesday ideas in my head. Core would just be updated to also look for src/model.py
or whatever standard we set instead of just model.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will not work with git clone, since required model.py
is in sub-directory of vllm_backend
, plus clone
will clone tests as well.
We can discuss the best solution at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the ease of development, I think your earlier idea of symlinks makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will not work with git clone, since required model.py is in sub-directory of vllm_backend, plus clone will clone tests as well.
I know it won't work as-is and would require minor changes. Not necessarily asking for this feature at this time, just food for thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a separate goal of improving python backend developer experience (more for things like debugging, ipdb, etc) somewhere in the pipeline, so this came to mind as a tangential idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, by any chance, do you know in what ticket this is tracked? If you don't remember, then no worries
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
LGTM besides a minor suggestion. Great work @dyastremsky ! |
Co-authored-by: Tanmay Verma <tanmay2592@gmail.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
Amazing work on this, @dyastremsky ! |
Co-authored-by: Neelay Shah <neelays@nvidia.com> Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Co-authored-by: Tanmay Verma <tanmay2592@gmail.com>
Draft documentation to allow users to quickly use the vLLM backend to run their models.