Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase idle timeout #159

Open
uneetsingh opened this issue Jun 5, 2023 · 17 comments
Open

Increase idle timeout #159

uneetsingh opened this issue Jun 5, 2023 · 17 comments
Labels

Comments

@uneetsingh
Copy link

uneetsingh commented Jun 5, 2023

Hi! I have been attempting to increase the idleTimeout but seems like it remain the default 30 seconds only.

Even though here idleTimout is set to 120 seconds, jetty is still enforcing it's default 30s as timeout. Below are few logs to make the case:

[05/Jun/2023:07:42:15 +0000] "POST /service/processQuantityText HTTP/1.1" 200 47605 "-" "python-requests/2.31.0" 30181
[05/Jun/2023:07:42:16 +0000] "POST /service/processQuantityText HTTP/1.1" 200 47605 "-" "python-requests/2.31.0" 30678
[05/Jun/2023:07:42:16 +0000] "POST /service/processQuantityText HTTP/1.1" 200 47605 "-" "python-requests/2.31.0" 31110
[05/Jun/2023:07:42:16 +0000] "POST /service/processQuantityText HTTP/1.1" 200 47604 "-" "python-requests/2.31.0" 29445
[05/Jun/2023:07:42:17 +0000] "POST /service/processQuantityText HTTP/1.1" 200 47605 "-" "python-requests/2.31.0" 29920
[05/Jun/2023:07:42:17 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30052
[05/Jun/2023:07:42:17 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30052
[05/Jun/2023:07:42:17 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30050
[05/Jun/2023:07:42:17 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30051
[05/Jun/2023:07:42:17 +0000] "POST /service/processQuantityText HTTP/1.1" 200 47605 "-" "python-requests/2.31.0" 30405
[05/Jun/2023:07:42:18 +0000] "POST /service/processQuantityText HTTP/1.1" 200 47605 "-" "python-requests/2.31.0" 31708
[05/Jun/2023:07:42:18 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30051
[05/Jun/2023:07:42:18 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30051
[05/Jun/2023:07:42:18 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30051
[05/Jun/2023:07:42:18 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30052
[05/Jun/2023:07:42:18 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30050
[05/Jun/2023:07:42:18 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30051
[05/Jun/2023:07:42:18 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30051
[05/Jun/2023:07:42:18 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30051
[05/Jun/2023:07:42:18 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30054
[05/Jun/2023:07:42:18 +0000] "POST /service/processQuantityText HTTP/1.1" 200 47605 "-" "python-requests/2.31.0" 32156
[05/Jun/2023:07:42:19 +0000] "POST /service/processQuantityText HTTP/1.1" 200 47605 "-" "python-requests/2.31.0" 32700
[05/Jun/2023:07:42:19 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30055
[05/Jun/2023:07:42:19 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30054
[05/Jun/2023:07:42:19 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30069
[05/Jun/2023:07:42:19 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30063
[05/Jun/2023:07:42:19 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30054
[05/Jun/2023:07:42:19 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30074
[05/Jun/2023:07:42:19 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30066
[05/Jun/2023:07:42:19 +0000] "POST /service/processQuantityText HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 30068
[05/Jun/2023:07:42:19 +0000] "POST /service/processQuantityText HTTP/1.1" 200 47605 "-" "python-requests/2.31.0" 33169
[05/Jun/2023:07:42:20 +0000] "POST /service/processQuantityText HTTP/1.1" 200 47605 "-" "python-requests/2.31.0" 33582
[05/Jun/2023:07:42:20 +0000] "POST /service/processQuantityText HTTP/1.1" 200 47605 "-" "python-requests/2.31.0" 34018
[05/Jun/2023:07:42:20 +0000] "POST /service/processQuantityText HTTP/1.1" 200 47604 "-" "python-requests/2.31.0" 30622
[05/Jun/2023:07:42:21 +0000] "POST /service/processQuantityText HTTP/1.1" 200 47605 "-" "python-requests/2.31.0" 31872
[05/Jun/2023:07:42:21 +0000] "POST /service/processQuantityText HTTP/1.1" 200 47605 "-" "python-requests/2.31.0" 32318
[05/Jun/2023:07:42:22 +0000] "POST /service/processQuantityText HTTP/1.1" 200 47605 "-" "python-requests/2.31.0" 33799
[05/Jun/2023:07:42:22 +0000] "POST /service/processQuantityText HTTP/1.1" 200 47605 "-" "python-requests/2.31.0" 34354
@lfoppiano lfoppiano changed the title Hi! I have been attempting to increase the idleTimeout but seems like it remain the default 30 seconds only. Increase idle timeout Jun 5, 2023
@lfoppiano
Copy link
Owner

Hi, thanks for your interest in grobid-quantities. I've reformatted the issue to make it more readable.

How many requests are you sending?
From the log it seems that your requests look shorter than 30 seconds. The idleTimeout works for requests that last more than 30 seconds.

In this case the requests get rejected because you're sending more than the allowed parallel request. There is an option in the config.yml: maxParallelRequests: 0

When is set to 0, it will limit to the number of CPUs available. How many CPUs do you have?
You can also try to change it to 2 and should return 503 after your exceed that number.

Let me know if you have more questions

@uneetsingh
Copy link
Author

uneetsingh commented Jun 5, 2023

Thank you so much for response.

This above log is part of load testing. So requests per second could be around ~50. And hence I understand a lot of them would fail.

How many CPUs do you have?

2

My understanding is this:
If I send more requests than set through maxParallelRequests, they are queued and if they remain idle for more then idleTimeout in the queue they are failed. Please correct me if this understanding is flawed.

The idleTimeout works for requests that last more than 30 seconds.

This is little confusing. Can you elaborate on this? If idleTimeout is set to 120s, the extra requests should remain suspended in queue and their connection will be closed if they remain idle for more than 120s. Correct?

@lfoppiano
Copy link
Owner

lfoppiano commented Jun 5, 2023

OK, I was mistaken with the grobid PDF documents processing. I have implemented this long time ago, for the trouble.

For this case, there is another parameter in the QoSFilter, which is called "waitMs" defaulted 50ms.

If you can rebuild grobid-quantities, you could try by adding this line in line 81 of GrobidQuantitiesApplication

        qos.setInitParameter("waitMs", String.valueOf("120"));

That should change the waiting time to 120 ms. You can enlarge it more based on your needs. if this works we can add it as parameter from the configuration.

Another thing I suggest is to make sure the client handle the 503 response and wait before requesting again. The python client should do that already: https://github.com/lfoppiano/grobid-quantities-python-client/

@uneetsingh
Copy link
Author

Thank you. This is very helpful. I will update this thread with the feedback once i rerun the service after rebuilding.

@lfoppiano
Copy link
Owner

I've made the modification in the branch feature/add-wait-ms. You can change the timeout from the configuration.

@lfoppiano
Copy link
Owner

@uneetsingh did you manage to do any progress on this?

I'd like to release 0.7.3 before the end of the month, including this PR if this solution is working. Could you share your script, I'd like to try to reproduce the same behaviour you reported.

@uneetsingh
Copy link
Author

uneetsingh commented Jun 15, 2023

qos.setInitParameter("waitMs", String.valueOf("120"));

I tried this and the issue still persisted i.e. requests that took more than 30s failed with 503 status. And apologies couldn't test using the new branch that you have pushed.

Regarding error reproduction, I was doing load testing using locust. Below is the simple code

from locust import HttpUser, task, between
import random
text = '''Put some test text here'''

class AppUser(HttpUser):
    wait_time = between(1,6)
    @task
    def index_page(self):
        response = self.client.post("/service/processQuantityText", files = {
            "text":text + str(random.randint(3, 9))
        })
        if response.status_code!=200:
            print(response.status_code, response.text)

I was spawning 100 users with with Spawn rate as 10.

[Update]
On the grobid quant server front, It was a 2 cpu machine with default configurations for the server.

@lfoppiano
Copy link
Owner

Thanks!

I've revised a bit the matter, and I think the waitMs is used only to decide whether to suspend or not the request.

I've revised the locust test I had written some time ago, you can find an old updated test in resources/locust which is similar to your snippet.

I've started the server with maxParallelRequest:2 and waitMs: 1, and I've ran a test with 100 users for 10 minutes, and I did not get any failure. See report:
Test Report for locustfile.py.pdf

Then I tried with 10000 users and it started failing after 2000 parallel users:
Test Report for locustfile.py.pdf

I suppose it's also related to the amount of threads and queue that are allocated, see from the config.yaml:

  maxThreads: 2048
  maxQueuedRequests: 2048
  acceptQueueSize: 2048

Your script also uses the processTextQuantity, I wonder, are you sure there were requests that took more than 30s to be completed? Could you check that? It's quite strange that this could happens with text requests

@uneetsingh
Copy link
Author

uneetsingh commented Jun 15, 2023

Thank you!

From the report's you have posted also, it is clear there seems to be some threshold of 30s(10000 users report) as the response time has an upper cap of 30s and the response time curve is maturing at 30s. I was trying to increase this threshold. I tried changing the maxThreads, maxQueuedRequests, acceptQueueSize also but that no headway here also.

Whether this 503 failure happens at 100 users or 10000 users depend on the size of the payload. The payload which I was sending was three four paragraphs long.

Your script also uses the processTextQuantity

Yes it does. If you see the logs which I have posted in the issue description, that's the same endpoint.

@lfoppiano
Copy link
Owner

@uneetsingh you're right! I did not see it... indeed! I think now I understood what you mean.

I tried to change the parameter idleTimeout but as you said since the beginning 😅 it's not working.

The only solution could be this: dropwizard/dropwizard#1674 (comment) but I'm not so sure is doable...

@uneetsingh
Copy link
Author

uneetsingh commented Jun 15, 2023

@lfoppiano That's okay! 😅 You have been helping throughout. I really appreciate that.

The dropwizard issue you have pointed to seems to address the problem directly. Will try to do it. What is key here is the meaning of idleTimeout. That dropwizard issue at least clears that.idleTiemout is in no way related to the processing time but instead with the time it takes for the server to handle requests(lock thread for that request).

Will update if I have any solution for this. Thanks a lot again.

@lfoppiano
Copy link
Owner

Thank you!

By the way, I'm planning to update grobid-quantities to dropwizard 4 at some point (in #136, although it says dropwizard 2) and we might end up having this fixed with a more recent version of jetty. I'm not sure.

@lfoppiano
Copy link
Owner

lfoppiano commented Jun 16, 2023

Just a quick update, I cannot really let it go so easily 🐶
I've updated the configuration correctly in both master and the PR.
If you set the idleTimeout option at the level of the connector, it's correctly loaded in the configuration and in the server connector (it's a bit hard to see):

image

However after testing again with locust, it caps at 30s still.

If I debug the class AbstractConnector, and I make a request, the value is correctly set:

image

So I haven't followed all the trail but might be useful to double check with dropwizard 4.0 and a new version of jetty.

@lfoppiano
Copy link
Owner

I think the idleTimeout works fine.

E.g. If i set it to 2 seconds, I get:

DEBUG  [11:05:43.872] [dw-26] o.e.j.i.IdleTimeout -  Setting idle timeout 0 -> 2000 on SocketChannelEndPoint@e7d8c9e[{l=/127.0.0.1:8060,r=/127.0.0.1:53272,OPEN,fill=-,flush=-,to=13/2000}{io=0/0,kio=0,kro=0}]->[<null>]

and then it fails when it goes over 2 seconds:

Timeout -  SocketChannelEndPoint@e7d8c9e[{l=/127.0.0.1:8060,r=/127.0.0.1:53272,OPEN,fill=-,flush=-,to=1764/2000}{io=0/0,kio=0,kro=1}]->[HttpConnection@3104d8f1[p=HttpParser{s=END,2659208 of 2659208},g=HttpGenerator@7ec3ba88{s=START}]=>HttpChannelOverHttp@2e096ac{s=HttpChannelState@110cb986{s=HANDLING rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=1},r=1,c=false/false,a=HANDLING,uri=http://localhost:8060/service/annotateQuantityPDF,age=1967}] idle timeout check, elapsed: 1764 ms, remaining: 236 ms 
DEBUG  [11:05:46.103] [Scheduler-713702106-1] o.e.j.i.IdleTimeout -  SocketChannelEndPoint@e7d8c9e[{l=/127.0.0.1:8060,r=/127.0.0.1:53272,OPEN,fill=-,flush=-,to=2006/2000}{io=0/0,kio=0,kro=1}]->[HttpConnection@3104d8f1[p=HttpParser{s=END,2659208 of 2659208},g=HttpGenerator@7ec3ba88{s=START}]=>HttpChannelOverHttp@2e096ac{s=HttpChannelState@110cb986{s=HANDLING rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=1},r=1,c=false/false,a=HANDLING,uri=http://localhost:8060/service/annotateQuantityPDF,age=2208}] idle timeout check, elapsed: 2006 ms, remaining: -6 ms 
DEBUG  [11:05:46.103] [Scheduler-713702106-1] o.e.j.i.IdleTimeout -  SocketChannelEndPoint@e7d8c9e[{l=/127.0.0.1:8060,r=/127.0.0.1:53272,OPEN,fill=-,flush=-,to=2006/2000}{io=0/0,kio=0,kro=1}]->[HttpConnection@3104d8f1[p=HttpParser{s=END,2659208 of 2659208},g=HttpGenerator@7ec3ba88{s=START}]=>HttpChannelOverHttp@2e096ac{s=HttpChannelState@110cb986{s=HANDLING rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=1},r=1,c=false/false,a=HANDLING,uri=http://localhost:8060/service/annotateQuantityPDF,age=2208}] idle timeout expired 
DEBUG  [11:05:46.104] [Scheduler-713702106-1] o.e.j.i.FillInterest -  onFail FillInterest@7cfd432c{null} 
java.util.concurrent.TimeoutException: Idle timeout expired: 2006/2000 ms
	at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:170)
	at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:112)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

If I set it to 33 seconds, I get timeout as before:

java.util.concurrent.TimeoutException: Idle timeout expired: 33001/33000 ms
	at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:171)
	at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:113)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

Then if I set it to 60 seconds, I get no such exceptions...
In all the cases the maximum requests timeout was max 30s:

image

I start to believe the idleTimeout is correctly set, and the problem is somewhere else

@lfoppiano
Copy link
Owner

@uneetsingh I'm preparing for the new release 0.8.0 that among updating grobid, updates dropwizard to version 4.
Have you had any progress on this issue?

@uneetsingh
Copy link
Author

Hi @lfoppiano, No headway on that issue. I hope that Version 4 will solve this.

@lfoppiano
Copy link
Owner

lfoppiano commented Dec 18, 2023

I've merged PR #136 in the master. I did some tests, although the logs don't mention any timeout. I see some 503 after the timeout requests are reaching 30s.
Could you give it a try? You can use the docker image lfoppiano/grobid-quantities:latest-develop that should have all the fixes.

Here there was a fix in Jetty 10 to make the idle-timeout work, however, I'm not sure what else could be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants