-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-13672 control: Bump system_ram_reserved to reduce OOM occurrences #12430
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Bug-tracker data: |
Test stage Unit Test on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-12430/1/testReport/(root)/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12430/2/execution/node/1095/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Unit Test on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-12430/3/testReport/ |
Test stage Functional on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-12430/4/testReport/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
merged 6e16c1c into this PR so number of targets are taken into account when calculating engine memory reservation during scm auto sizing |
Test stage Build RPM on Leap 15.4 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12430/7/execution/node/353/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12430/7/execution/node/360/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current changes look OK to me. I'll hold off on approval until it runs with the correct pragma.
Required-githooks: true Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
…srsvd-bump Allow-unstable: true Test-tag: pr daily_regression Test-nvme: auto_md_on_ssd Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12430/19/execution/node/1315/log |
https://build.hpdd.intel.com/blue/organizations/jenkins/daos-stack%2Fdaos/detail/PR-12430/19/tests/ Shows the following identified failures: And 2 new failures: |
Skip-func-test-vm: true Test-tag: test_snapshot_aggregation Test-nvme: auto_md_on_ssd Required-githooks: true Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12430/20/execution/node/328/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
reducing system_ram_reserved to try to make test_snapshot_aggregation pass |
snapshot aggregation test now passing as per https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-12430/21/testReport/FTEST_container/SnapshotAggregation/ combining runs 19 & 21 there are only expected failures identified failures: @phender can you take a look to see if we are at the stage where this could be force landed? obviously pending review approvals |
The ARM CI testing on master failed for this PR, please fix https://github.com/daos-stack/daos/actions/runs/5721566985/job/15503353047 |
…#12597) During the evaluation of an optimum RAM-disk size, update per-engine memory reservation per-engine calculation to take into account the number of targets. Reserve 128mib of RAM per target. control: Bump system_ram_reserved to reduce OOM occurrences (#12430) Attempt to reduce the chance of OOM killer terminating an engine process when maximum pool space is allocated by slightly increasing the system_ram_reserved default value from 6->16gib. Some test yaml system_ram_reserved values have been reduced to 6 to prevent the increase in the default from causing an available memory check failure on engine start-up in memory constrained (VM) environments. Increasing the default value should also resolve DAOS-13918 by providing a larger memory buffer to reduce the chance of intermittent failures related to this check. test: Reduce system_ram_reserved for GHA ARM build (#12746) Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
…#12597) (#12755) During the evaluation of an optimum RAM-disk size, update per-engine memory reservation per-engine calculation to take into account the number of targets. Reserve 128mib of RAM per target. control: Bump system_ram_reserved to reduce OOM occurrences (#12430) Attempt to reduce the chance of OOM killer terminating an engine process when maximum pool space is allocated by slightly increasing the system_ram_reserved default value from 6->16gib. Some test yaml system_ram_reserved values have been reduced to 6 to prevent the increase in the default from causing an available memory check failure on engine start-up in memory constrained (VM) environments. Increasing the default value should also resolve DAOS-13918 by providing a larger memory buffer to reduce the chance of intermittent failures related to this check. test: Reduce system_ram_reserved for GHA ARM build (#12746) Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Attempt to reduce the chance of OOM killer terminating an engine
process when maximum pool space is allocated by slightly increasing
the system_ram_reserved default value from 6->16gib.
Some test yaml system_ram_reserved values have been reduced to 6 to
prevent the increase in the default from causing an available memory
check failure on engine start-up in memory constrained (VM)
environments. Increasing the default value should also resolve
DAOS-13918 by providing a larger memory buffer to reduce the chance
of intermittent failures related to this check.
Required-githooks: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: