-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2.64/2.65/2.66 keeps launching duplicate VMs until JNLP is connected #374
Comments
This behavior is present in 2.65 as well. I keep having to revert back to 2.63 which is unfortunate as I think this prevents updating the underlying jenkins core since it removes prototype.js (I may have that backward). |
We're seeing the same issue. We're launching windows nodes via openstack cloud plugin + UserData script that initiates JNLP connection. When the plugin first brings up a node in response to a pipeline job, Jenkins shows that the node is offline, likely because the UserData script has not run yet and the JNLP connection has not yet been established. The openstack plugin will continue to spin up new nodes until max instances for the template is reached, or until the JNLP connection is established an an executor has been selected for the pipeline job. In addition, on the node where the JNLP connection succeeds, the openstack plugin fails to remove the node once it is idle. It seems to ignore the retention time config in the template. |
I have confirmed that this behavior is due to 80b6780 . I backed out the change to the isWaitingFor function and the plugin no longer launches a ton of extra VMs. The why is a little less clear. It should only return null if terminated is set to true and yet I'm pretty sure that's what's causing this. If it returns null then provisionSlave returns the slave rather than waiting for it to actually be ready-ish. The commit message didn't make a lot of sense to me. I don't see any negatives to reverting it but there must have been a reason. More research and probably a better fix is required. However if you're willing to compile it yourself just back out the above change and this behavior goes away. Side note... while looking for alternate causes I found this bit of code in plugin/src/main/java/jenkins/plugins/openstack/compute/JCloudsCloud.java around line 283. I'm like missing something but I can't undestand that this for loop is actually doing anything other than potentially adding to the queue, unecessarily, multiple times? I took out the for loop and it didn't seem to cause any problems. ymmv
|
Hi @mdonahoe-cisco . Total side topic. Any chance you could share your UserData script (or barebones of it)? I have never managed to get that working and ended up creating a scheduled task on the VM that launches JNLP on startup. That is, obviously, very hard to maintain as it requires image updates. Would much rather use UserData if I could get it working! TIA. |
@steveames I think there are workarounds for a few different things in here.. Can't remember exactly.. The strangest thing is parsing the node name from the SLAVE_JAR_URL so that we can pass it to hudson.remoting.jnlp.Main. A bit ugly but hope it helps. rem cmd
@echo on
curl ${SLAVE_JAR_URL} -o C:\Users\cloudbase-init\agent.jar
REM GET THE NODE NAME FROM THE URL
REM e.g. https://example.com/jenkins/myNamespace/computer/windows-reg-test-7276/slave-agent.jnlp
setlocal EnableDelayedExpansion
set url=${SLAVE_JNLP_URL}
REM Replace '/' with ' ' and split into array
set i=0
for %%a in (%url:/= %) do (
set /a i+=1
set "part[!i!]=%%a"
)
REM Get the last and last elements
set /a lastIndex=i-1
set "nodename=!part[%lastIndex%]!"
REM Output the results
REM e.g. windows-reg-test-7276
echo nodename: %nodename%
java -cp C:\Users\cloudbase-init\agent.jar hudson.remoting.jnlp.Main -url ${JENKINS_URL} -webSocket -workDir C:\jenkins_workspace -headless ${SLAVE_JNLP_SECRET} %nodename%
endlocal |
Just verified that the issue still exists with 2.66 |
Jenkins and plugins versions report
Environment
What Operating System are you using (both controller, and any agents involved in the problem)?
Jenkins is Ubuntu 22.04. I'm launching windows 10 VMs using JNLP to connect. Number of Executors is 1. Retention time is 0. Connection type is JNLP.
Reproduction steps
Expected Results
A single node/agent/VM gets launched and jenkins waits for it to connect.
Actual Results
A LOT of VMs get launched. On my system windows nodes take around 4 minutes to get online far enough to establish a JNLP connection. During this time multiple VMs get launched. This is exacerbated if multiple windows nodes/agents are requested then the problem multiplies.
Anything else?
This is new behavior in 2.64. I expect related to commit: 80b6780
The text was updated successfully, but these errors were encountered: