-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ollama and requests: support proper streaming #14
base: main
Are you sure you want to change the base?
Conversation
Supports proper streaming of inputs via curl and incremental building of autocomplete suggestions for Ollama. This is necessary on lower-end or non-GPU machines. Additionally, kills previous curl calls to ensure that multiple curls will not be fired in a row during autocomplete, which kills the ollama server.
Update, this also has a back-end issue since Ollama's http server is stateful and does not handle canceling curl very well. Potentially the ollama backend should be reworked to use a pipe to |
@maxwell-bland |
@tzachar not sure they went through, the merge pull request (below) should highlight review changes? I need to fix ollama's http server setup to add appropriate request cancellation so it doesn't get flooded, but do not have an abundance of time just yet, hopefully this weekend |
@maxwell-bland |
@tzachar sorry if it was not clear: your comments did not get posted, as far as I can see. What is your opinion on the change? Were there some comments in the code that I missed? I think it was fine to use curl in the short term, but it would probably be better to switch it to a subprocess that works via pipes since it would reduce latency and difficulties in managing requests. I will update this commit once I have some more time. I looked at adding timeout/session management to ollama yesterday but it will be a bit of a slog. |
@maxwell-bland you can see my comments above in this thread, with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a comment
@tzachar apologies, I see no "view reviewed changes" similar to my own comment above. potentially it is just github being unintuitive. Maybe you could leave your changes in a direct comment on the pull request? I checked through the inbox on the site, etc, and cannot find the comments anywhere |
All my comments are included inside gitlab's review process. |
@tzachar if the comments show as pending that means they haven't been sent yet. If you write comments as part of a PR review (and not just as individual comments) you need to give a decision on the PR before they're posted - either Accept/Request Changes/Comment I believe. There's a guide here about the process https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/reviewing-proposed-changes-in-a-pull-request |
@@ -44,10 +46,30 @@ function Service:Get(url, headers, data, cb) | |||
args[#args + 1] = h | |||
end | |||
|
|||
job | |||
if lastjob ~= nil then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use the job:shutdown method here? using kill
means it will only work on linux/mac
options = self.params.options, | ||
} | ||
local new_data = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems off.
You redefine it inside the callback later (line 35).
Thanks @alexandradeas |
Supports proper streaming of inputs via curl and incremental building of autocomplete suggestions for Ollama. This is necessary on lower-end or non-GPU machines. Additionally, kills previous curl calls to ensure that multiple curls will not be fired in a row during autocomplete, which kills the ollama server.
Needs testing on non-ollama services (I have no access).
Thanks!
Maxwell