-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a new feature to schedule slaves for termination #265
base: master
Are you sure you want to change the base?
Conversation
@Lorgouil, hello, can you please be more specific on what would be the goal of the feature? you see, the config and retention strategy are complicated enough so I am making sure this is necessary and it plays nice with the rest of the features. The way I currently understand it, it will remove oldest node in case there are some excessive (which is tangential to existing time based retention strategy). I find that approach rather strange given what it would do in case there is 1 vs. 20 idle machines (former eliminate all idle machines, later reduces the pool slightly). |
@olivergondza, hello, We add this feature because we’ve had some issues with the retention strategy: When we used Retention Time > 0, sometimes slaves were never destroyed and the storage saturates on these. Here this feature solves this problem by destroying the oldest VM. Secondly, we use multi-executor slaves. In this case, we can’t use Retention Time = 0 because we lose many executors. this feature is different than the current retention strategy, because it doesn’t wait for the slave to be unused to schedule it for termination (Retention Time > 0) and it support multi-Executors slaves (Retention Time = 0). |
We have rather massive customer base relying on destroying VM through existing retention strategy so I would much rather make sure it is working correctly rather than adding different strategy to compensate some defects somewhere. BTW, rather recently we ware bitten by jenkinsci/resource-disposer-plugin#3 - when there ware large number of entries to be disposed (might be OpenStack servers that require extensive retying but in our case it ware stalled ws-cleanup directories), disposing others ware severely prolonged or even blocked. Can you be more specific on when you do see VMs not destroyed? |
@olivergondza in our Jenkins instances we have multi executors VMs and different build jobs. Some build jobs can use VM during a long time, and during this time nothing is did to schedule the destruction of the VM. The virtual machine is still used and reused for other build jobs. It's why we decide to add the feature. This is activable with a checkbox in templates parameters. If the option is not checked, the retention strategy will be use. |
@Lorgouil, so you have multiple executors per node configured and when one is occupied for a long time, multiple jobs can be executed on the remaining ones. Which can prolong the usage of a node as it does not get idle for long enough time easily. You suggest to deal with that situation by scheduling nodes for termination even when they are occupied to ensure the number of builds dispatched on a node is manageable. I must say majority users are ok with using a single executor approach. I am wondering whether using single executor and smaller VM flavor (less resources per node) would not help you while keeping the current implementation. |
@olivergondza, We've already tried to use slaves with single executor in the past, but we can't work with these on account of our compiler. We must use multi-executors VMs. It's why we decided to add this new strategy to schedule VMs for termination. That doesn't prevent to use the actual Retention Strategy, which in most of the cases work perfectly. |
This new feature can be enabled with a checkbox for each template. It schedules for termination the oldest VM for each template when it’s possible (ie: There are more free executors than the minimum of instances multiplied by the number of executors defined in template parameters).