Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout during upgrade #28

Open
electrum opened this issue Apr 17, 2012 · 1 comment
Open

Timeout during upgrade #28

electrum opened this issue Apr 17, 2012 · 1 comment

Comments

@electrum
Copy link
Contributor

The first upgrade command timed out:

[23:00 ubuntu@i-64580004:~ prod] galaxy upgrade -b discovery-elb discovery-elb:1.2 @discovery-elb:1
uuid  host         machine     status   binary             config                      
0561  10.94.13.48  i-4c451d2c  STOPPED  discovery-elb:1.1  @discovery-elb:general:1.0

Are you sure you would like to UPGRADE these servers? [y/N] y

java.net.SocketTimeoutException: Read timed out

The second upgrade returned a weird error:

[23:00 ubuntu@i-64580004:~ prod] galaxy upgrade -b discovery-elb discovery-elb:1.2 @discovery-elb:1
uuid  host         machine     status   binary             config                      
0561  10.94.13.48  i-4c451d2c  STOPPED  discovery-elb:1.1  @discovery-elb:general:1.0

Are you sure you would like to UPGRADE these servers? [y/N] y

uuid  host         machine     status   binary             config                      
0561  10.94.13.48  i-4c451d2c  UNKNOWN  discovery-elb:1.1  @discovery-elb:general:1.0  UnexpectedResponseException{request=Request{uri=http://10.94.13.48:65000/v1/agent/slot/0561a95c-8c22-417e-963a-981b2ff9b3fb/assignment, method='PUT', headers={x-galaxy-agent-version=[b9bcdfa080fe634c57f41dd88c09542e], x-galaxy-slot-version=[21e7371f3c7e9c64628d44b964c456e2], Content-Type=[application/json]}, bodyGenerator=com.proofpoint.http.client.JsonBodyGenerator@15fd3c35}, statusCode=500, statusMessage='Could not obtain slot lock within 1000.00ms held by null thread is at  com.proofpoint.galaxy.agent.DeploymentSlot.lock(DeploymentSlot.java:346)   at com.proofpoint.galaxy.agent.DeploymentSlot.assign(DeploymentSlot.java:163)   at com.proofpoint.galaxy.agent.AssignmentResource.assign(AssignmentResource.java:70)   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)   at java.lang.reflect.Method.invoke(Method.java:597)   at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)   at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)   at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)   at com.sun.jers', headers={Content-Length=[10834], Content-Type=[text/html;charset=ISO-8859-1], Cache-Control=[must-revalidate,no-cache,no-store]}}

The third succeeded:

[23:00 ubuntu@i-64580004:~ prod] galaxy upgrade -b discovery-elb discovery-elb:1.2 @discovery-elb:1
uuid  host         machine     status   binary             config            
0561  10.94.13.48  i-4c451d2c  STOPPED  discovery-elb:1.2  @discovery-elb:1

Are you sure you would like to UPGRADE these servers? [y/N] y

uuid  host         machine     status   binary             config            
0561  10.94.13.48  i-4c451d2c  STOPPED  discovery-elb:1.2  @discovery-elb:1

The timeout might be caused by the Nexus proxy being slow. This was the first access for that artifact.

@dain
Copy link
Owner

dain commented Apr 18, 2012

For the first one, the request timed out in the client. For the second one, the agent timed out waiting for the slot lock, because it was still running the first upgrade request. If you look closely at the third request, the server was already at version 1.2 and you simply upgraded it to 1.2 again.

So all of the problems were caused by the first request taking a long time. This was most likely caused by downloading the binary into your nexus repo. The third command was fast binary was already in you nexus repo.

The real problem here is we timeout too aggressively for long running commands like install and stop, and we need transient states like "installing", "restarting" and "stopping", so the user knows what is going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants