Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.64/2.65/2.66 keeps launching duplicate VMs until JNLP is connected #374

Open
steveames opened this issue Oct 25, 2023 · 6 comments
Open
Labels

Comments

@steveames
Copy link

Jenkins and plugins versions report

Environment
Jenkins: 2.414.2
OS: Linux - 6.2.0-1013-aws
Java: 11.0.20.1 - Ubuntu (OpenJDK 64-Bit Server VM)
---
Parameterized-Remote-Trigger:3.2.0
PrioritySorter:5.0.0
active-directory:2.33
analysis-model-api:11.10.0
ant:497.v94e7d9fffa_b_9
antisamy-markup-formatter:162.v0e6ec0fcfcf6
apache-httpcomponents-client-4-api:4.5.14-208.v438351942757
artifactory:3.18.12
authentication-tokens:1.53.v1c90fd9191a_b_
aws-credentials:218.v1b_e9466ec5da_
aws-java-sdk:1.12.529-406.vdeff15e5817d
aws-java-sdk-cloudformation:1.12.529-406.vdeff15e5817d
aws-java-sdk-codebuild:1.12.529-406.vdeff15e5817d
aws-java-sdk-ec2:1.12.529-406.vdeff15e5817d
aws-java-sdk-ecr:1.12.529-406.vdeff15e5817d
aws-java-sdk-ecs:1.12.529-406.vdeff15e5817d
aws-java-sdk-efs:1.12.529-406.vdeff15e5817d
aws-java-sdk-elasticbeanstalk:1.12.529-406.vdeff15e5817d
aws-java-sdk-iam:1.12.529-406.vdeff15e5817d
aws-java-sdk-kinesis:1.12.529-406.vdeff15e5817d
aws-java-sdk-logs:1.12.529-406.vdeff15e5817d
aws-java-sdk-minimal:1.12.529-406.vdeff15e5817d
aws-java-sdk-secretsmanager:1.12.529-406.vdeff15e5817d
aws-java-sdk-sns:1.12.529-406.vdeff15e5817d
aws-java-sdk-sqs:1.12.529-406.vdeff15e5817d
aws-java-sdk-ssm:1.12.529-406.vdeff15e5817d
basic-branch-build-strategies:81.v05e333931c7d
bitbucket:223.vd12f2bca5430
blueocean:1.27.8
blueocean-autofavorite:1.2.5
blueocean-bitbucket-pipeline:1.27.8
blueocean-commons:1.27.8
blueocean-config:1.27.8
blueocean-core-js:1.27.8
blueocean-dashboard:1.27.8
blueocean-display-url:2.4.2
blueocean-events:1.27.8
blueocean-git-pipeline:1.27.8
blueocean-github-pipeline:1.27.8
blueocean-i18n:1.27.8
blueocean-jira:1.27.8
blueocean-jwt:1.27.8
blueocean-personalization:1.27.8
blueocean-pipeline-api-impl:1.27.8
blueocean-pipeline-editor:1.27.8
blueocean-pipeline-scm-api:1.27.8
blueocean-rest:1.27.8
blueocean-rest-impl:1.27.8
blueocean-web:1.27.8
bootstrap5-api:5.3.2-1
bouncycastle-api:2.29
branch-api:2.1128.v717130d4f816
build-environment:1.7
build-monitor-plugin:1.14-745.ve2023a_305f40
build-timeout:1.31
build-timestamp:1.0.3
build-user-vars-plugin:1.9
build-with-parameters:76.v9382db_f78962
buildtriggerbadge:251.vdf6ef853f3f5
caffeine-api:3.1.8-133.v17b_1ff2e0599
checks-api:2.0.2
cloud-stats:320.v96b_65297a_4b_b_
cloudbees-bitbucket-branch-source:848.v42c6a_317eda_e
cloudbees-folder:6.848.ve3b_fd7839a_81
cobertura:1.17
code-coverage-api:4.9.0
command-launcher:107.v773860566e2e
commons-lang3-api:3.13.0-62.v7d18e55f51e2
commons-text-api:1.10.0-78.v3e7b_ea_d5a_fe1
conditional-buildstep:1.4.3
config-file-provider:959.vcff671a_4518b_
console-badge:1.1
credentials:1293.vff276f713473
credentials-binding:636.v55f1275c7b_27
custom-tools-plugin:0.8
data-tables-api:1.13.6-5
display-url-api:2.200.vb_9327d658781
docker-commons:439.va_3cb_0a_6a_fb_29
docker-workflow:572.v950f58993843
durable-task:523.va_a_22cf15d5e0
ec2:1628.v6d7b_fc58b_a_1d
echarts-api:5.4.0-6
email-ext:2.102
embeddable-build-status:412.v09da_db_1dee68
envinject-api:1.199.v3ce31253ed13
extended-choice-parameter:376.v2e02857547b_a_
extended-read-permission:53.v6499940139e5
external-monitor-job:215.v2e88e894db_f8
favorite:2.4.3
file-leak-detector:1.12
font-awesome-api:6.4.2-1
forensics-api:2.3.0
git:5.2.0
git-client:4.5.0
git-server:99.va_0826a_b_cdfa_d
github:1.37.3
github-api:1.316-451.v15738eef3414
github-branch-source:1741.va_3028eb_9fd21
github-checks:554.vb_ee03a_000f65
gradle:2.8.2
groovy:457.v99900cb_85593
handy-uri-templates-2-api:2.1.8-22.v77d5b_75e6953
htmlpublisher:1.32
instance-identity:173.va_37c494ec4e5
ionicons-api:56.v1b_1c8c49374e
ivy:2.5
jackson2-api:2.15.3-366.vfe8d1fa_f8c87
jacoco:3.3.5
jakarta-activation-api:2.0.1-3
jakarta-mail-api:2.0.1-3
javadoc:243.vb_b_503b_b_45537
javax-activation-api:1.2.0-6
javax-mail-api:1.6.2-9
jaxb:2.3.8-1
jdk-tool:73.vddf737284550
jenkins-design-language:1.27.8
jersey2-api:2.40-1
jira:3.11
jjwt-api:0.11.5-77.v646c772fddb_0
jnr-posix-api:3.1.18-1
job-restrictions:0.8
jquery:1.12.4-1
jquery3-api:3.7.1-1
jsch:0.2.8-65.v052c39de79b_2
junit:1240.vf9529b_881428
junit-realtime-test-reporter:135.vf92a_7fe68b_15
label-linked-jobs:6.0.1
ldap:701.vf8619de9160a_
lockable-resources:1185.v0c528656ce04
mailer:463.vedf8358e006b_
mapdb-api:1.0.9-28.vf251ce40855d
matrix-auth:3.2.1
matrix-project:818.v7eb_e657db_924
maven-plugin:3.23
mercurial:1260.vdfb_723cdcc81
metrics:4.2.18-442.v02e107157925
mina-sshd-api-common:2.10.0-69.v28e3e36d18eb_
mina-sshd-api-core:2.10.0-69.v28e3e36d18eb_
node-iterator-api:49.v58a_8b_35f8363
okhttp-api:4.11.0-157.v6852a_a_fa_ec11
openstack-cloud:2.64
p4:1.14.3
pam-auth:1.10
parallel-test-executor:418.v24f9a_141d726
parameterized-scheduler:255.v73827fcdf618
parameterized-trigger:2.46
pipeline-aws:1.43
pipeline-build-step:505.v5f0844d8d126
pipeline-graph-analysis:202.va_d268e64deb_3
pipeline-graph-view:202.v6da_a_9e590325
pipeline-groovy-lib:689.veec561a_dee13
pipeline-input-step:477.v339683a_8d55e
pipeline-milestone-step:111.v449306f708b_7
pipeline-model-api:2.2144.v077a_d1928a_40
pipeline-model-definition:2.2144.v077a_d1928a_40
pipeline-model-extensions:2.2144.v077a_d1928a_40
pipeline-rest-api:2.33
pipeline-stage-step:305.ve96d0205c1c6
pipeline-stage-tags-metadata:2.2144.v077a_d1928a_40
pipeline-stage-view:2.33
pipeline-utility-steps:2.16.0
plain-credentials:143.v1b_df8b_d3b_e48
plugin-usage-plugin:4.2
plugin-util-api:3.6.0
prism-api:1.29.0-8
pubsub-light:1.17
random-string-parameter:1.0
resource-disposer:0.23
run-condition:1.7
saferestart:0.7
saml:4.429.v9a_781a_61f1da_
scm-api:676.v886669a_199a_a_
scoring-load-balancer:59.vf791549fa_989
script-security:1275.v23895f409fb_d
skip-certificate-check:1.1
slack:684.v833089650554
snakeyaml-api:2.2-111.vc6598e30cc65
sonar:2.15
sse-gateway:1.26
ssh-agent:333.v878b_53c89511
ssh-credentials:308.ve4497b_ccd8f4
ssh-slaves:2.916.vd17b_43357ce4
sshd:3.312.v1c601b_c83b_0e
structs:325.vcb_307d2a_2782
subversion:2.17.3
support-core:1356.vd0f980edfa_46
text-finder:1.26
timestamper:1.26
token-macro:384.vf35b_f26814ec
trilead-api:2.84.v72119de229b_7
variant:60.v7290fc0eb_b_cd
versioncolumn:210.v94a_dca_868138
vsphere-cloud:2.27
warnings-ng:10.5.0
workflow-aggregator:596.v8c21c963d92d
workflow-api:1283.v99c10937efcb_
workflow-basic-steps:1042.ve7b_140c4a_e0c
workflow-cps:3802.vd42b_fcf00b_a_c
workflow-durable-task-step:1289.v4d3e7b_01546b_
workflow-job:1348.v32a_a_f150910e
workflow-multibranch:756.v891d88f2cd46
workflow-scm-step:415.v434365564324
workflow-step-api:639.v6eca_cd8c04a_a_
workflow-support:865.v43e78cc44e0d
ws-cleanup:0.45

What Operating System are you using (both controller, and any agents involved in the problem)?

Jenkins is Ubuntu 22.04. I'm launching windows 10 VMs using JNLP to connect. Number of Executors is 1. Retention time is 0. Connection type is JNLP.

Reproduction steps

  1. Run a job that requires a node/agent that uses JNLP
  2. Watch as it launches a new node/agent every minute or so until the builder requirement is met
  3. Since retention time is 0 all launched VMs will now just continue to run until they are used. Setting retention time to something else (like 1) will see the VMs get killed (after 10 minutes or so) but also introduces the possibility that a node/agent will get re-used which I never want.

Expected Results

A single node/agent/VM gets launched and jenkins waits for it to connect.

Actual Results

A LOT of VMs get launched. On my system windows nodes take around 4 minutes to get online far enough to establish a JNLP connection. During this time multiple VMs get launched. This is exacerbated if multiple windows nodes/agents are requested then the problem multiplies.

Anything else?

This is new behavior in 2.64. I expect related to commit: 80b6780

@steveames steveames added the bug label Oct 25, 2023
@steveames
Copy link
Author

This behavior is present in 2.65 as well. I keep having to revert back to 2.63 which is unfortunate as I think this prevents updating the underlying jenkins core since it removes prototype.js (I may have that backward).

@steveames steveames changed the title 2.64 keeps launching VMs until JNLP is connected 2.64/2.65 keeps launching duplicate VMs until JNLP is connected Dec 28, 2023
@mdonahoe-cisco
Copy link

We're seeing the same issue. We're launching windows nodes via openstack cloud plugin + UserData script that initiates JNLP connection. When the plugin first brings up a node in response to a pipeline job, Jenkins shows that the node is offline, likely because the UserData script has not run yet and the JNLP connection has not yet been established. The openstack plugin will continue to spin up new nodes until max instances for the template is reached, or until the JNLP connection is established an an executor has been selected for the pipeline job.

In addition, on the node where the JNLP connection succeeds, the openstack plugin fails to remove the node once it is idle. It seems to ignore the retention time config in the template.

@steveames
Copy link
Author

I have confirmed that this behavior is due to 80b6780 . I backed out the change to the isWaitingFor function and the plugin no longer launches a ton of extra VMs. The why is a little less clear. It should only return null if terminated is set to true and yet I'm pretty sure that's what's causing this. If it returns null then provisionSlave returns the slave rather than waiting for it to actually be ready-ish.

The commit message didn't make a lot of sense to me. I don't see any negatives to reverting it but there must have been a reason. More research and probably a better fix is required. However if you're willing to compile it yourself just back out the above change and this behavior goes away.

Side note... while looking for alternate causes I found this bit of code in plugin/src/main/java/jenkins/plugins/openstack/compute/JCloudsCloud.java around line 283. I'm like missing something but I can't undestand that this for loop is actually doing anything other than potentially adding to the queue, unecessarily, multiple times? I took out the for loop and it didn't seem to cause any problems. ymmv

            for (int i = 0; i < templateCapacity; i++) {
                int size = queue.size();
                if (size >= globalCapacity || size >= excessWorkload) return queue;

                queue.add(t);
            }

@steveames
Copy link
Author

steveames commented Feb 9, 2024

We're seeing the same issue. We're launching windows nodes via openstack cloud plugin + UserData script that initiates JNLP connection.

Hi @mdonahoe-cisco . Total side topic. Any chance you could share your UserData script (or barebones of it)? I have never managed to get that working and ended up creating a scheduled task on the VM that launches JNLP on startup. That is, obviously, very hard to maintain as it requires image updates. Would much rather use UserData if I could get it working! TIA.

@mdonahoe-cisco
Copy link

mdonahoe-cisco commented Feb 9, 2024

@steveames I think there are workarounds for a few different things in here.. Can't remember exactly.. The strangest thing is parsing the node name from the SLAVE_JAR_URL so that we can pass it to hudson.remoting.jnlp.Main. A bit ugly but hope it helps.

rem cmd
@echo on
curl ${SLAVE_JAR_URL} -o C:\Users\cloudbase-init\agent.jar

REM GET THE NODE NAME FROM THE URL
REM e.g. https://example.com/jenkins/myNamespace/computer/windows-reg-test-7276/slave-agent.jnlp
setlocal EnableDelayedExpansion
set url=${SLAVE_JNLP_URL}

REM Replace '/' with ' ' and split into array
set i=0
for %%a in (%url:/= %) do (
    set /a i+=1
    set "part[!i!]=%%a"
)

REM Get the last and last elements
set /a lastIndex=i-1

set "nodename=!part[%lastIndex%]!"

REM Output the results
REM e.g. windows-reg-test-7276
echo nodename: %nodename%

java -cp C:\Users\cloudbase-init\agent.jar hudson.remoting.jnlp.Main -url ${JENKINS_URL} -webSocket -workDir C:\jenkins_workspace -headless ${SLAVE_JNLP_SECRET} %nodename%
endlocal

@steveames steveames changed the title 2.64/2.65 keeps launching duplicate VMs until JNLP is connected 2.64/2.65/2.66 keeps launching duplicate VMs until JNLP is connected Dec 23, 2024
@steveames
Copy link
Author

Just verified that the issue still exists with 2.66

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants