-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16160 control: Update pool create --size % opt for MD-on-SSD p2 #14957
DAOS-16160 control: Update pool create --size % opt for MD-on-SSD p2 #14957
Conversation
Ticket title is 'Correctly implement dmg pool create --size option for MD-on-SSD phase-II' |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/1/execution/node/1551/log |
2bc9a2d
to
778c787
Compare
8748182
to
33fc7e5
Compare
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/3/execution/node/1508/log |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/3/execution/node/1492/log |
The following is with a single host with dual engines where bdev roles META and DATA are not shared. Two pools are created with VOS index file size equal to half the meta-blob size ( Rough calculations:
Note that the "Total memory-file size: 140 GB" reported is incorrect in
Next is with a single host with dual engines where bdev roles WAL, META and DATA are shared. Single pool with VOS index file size equal to the meta-blob size (
Rough calculations: 1.2TB of usable space is returned from storage scan and because roles are shared required META (70GB) is reserved so only 1.1TB is provided for data. Logging shows:
Now the same as above but with a single pool with VOS index file size equal to a quarter of the meta-blob size (
Rough calculations: 1.2TB of usable space is returned from storage scan and because roles are shared required META (279GB) is reserved so only ~900GB is provided for data. Logging shows:
Now with 6 ranks and a single pool with VOS index file size equal to a half of the meta-blob size (
Rough calculations: 1177 GB of usable space is returned from storage scan and because roles are shared required META (140 GB) is reserved so only 1037 GB is provided for data (per-rank). Logging shows:
|
33fc7e5
to
94b9cf6
Compare
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/4/execution/node/1416/log |
94b9cf6
to
2cd0529
Compare
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/5/execution/node/1511/log |
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14957/5/testReport/ |
Features: pool control Required-githooks: true Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
2cd0529
to
0b46a05
Compare
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14957/6/testReport/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
md_size = mp.GetUsableBytes() / uint64(ei.GetTargetCount()) | ||
metaBytes = mp.GetUsableBytes() / uint64(ei.GetTargetCount()) | ||
if memRatio > 0 { | ||
metaBytes = uint64(float64(metaBytes) / float64(memRatio)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not fully understanding this part. Did you mean to multiply by memRatio rather than divide? Or is the intention to make metaBytes larger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the intention is to use MemRatio fraction to project the effective meta-blob (per-target) by dividing the VOS-file size by the fraction. In MD-on-SSD phase-1 metaBytes == scmBytes (VOS-file size) . I will add a comment in the subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, understood. Thanks for the explanation. I think a comment in this area will be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment added
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/8/execution/node/1510/log |
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/8/execution/node/1478/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No other issues noted on my side.
Required-githooks: true Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Features: control pool Required-githooks: true Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
…abarr/control-size-poolcreate-mdonssd Features: pool control Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Features: control pool Required-githooks: true Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Increased test coverage with MD-on-SSD tests for meta/rdb size adjustments/computation and mem-ratio fraction case:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ftest LGTM
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/9/execution/node/1161/log |
Features: control pool Required-githooks: true Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/10/execution/node/1185/log |
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/11/execution/node/1514/log |
CI failed due to the following issues:
|
…abarr/control-size-poolcreate-mdonssd Test-tag: control pool pr Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
…abarr/control-size-poolcreate-mdonssd Test-tag: control pool pr Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14957/12/testReport/ |
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14957/13/testReport/ |
ListVerbose dmg tests that are failing with this change will be addressed in https://daosio.atlassian.net/browse/DAOS-16328 There is a bug in how the list of device target identifiers are returned for SMD devices on storage scan, this results in some under-subscription in pool create auto mode. this inaccuracy will be addressed in https://daosio.atlassian.net/browse/DAOS-16327 |
* DAOS-13701: Memory bucket allocator API definition (#13152) - New umem macros are exported to do the allocation within memory bucket. umem internally now calls the modified backend allocator routines with memory bucket id passed as argument. - umem_get_mb_evictable() and dav_get_zone_evictable() are added to support allocator returning preferred zone to be used as evictable memory bucket for current allocations. Right now these routines always return zero. - The dav heap runtime is cleaned up to make provision for memory bucket implementation. * DAOS-13703 umem: umem cache APIs for phase II (#13138) Four sets of umem cache APIs will be exported for md-on-ssd phase II: 1. Cache initialization & finalization - umem_cache_alloc() - umem_cache_free() 2. Cache map, load and pin - umem_cache_map(); - umem_cache_load(); - umem_cache_pin(); - umem_cache_unpin(); 3. Offset and memory address converting - umem_cache_off2ptr(); - umem_cache_ptr2off(); 4. Misc - umem_cache_commit(); - umem_cache_reserve(); * DAOS-14491: Retain support for phase-1 DAV heap (#13158) The phase-2 DAV allocator is placed under the subdirectory src/common/dav_v2. This allocator is built as a standalone shared library and linked to the libdaos_common_pmem library. The umem will now support one more mode DAOS_MD_BMEM_V2. Setting this mode in umem instance will result in using phase-2 DAV allocator interfaces. * DAOS-15681 bio: store scm_sz in SMD (#14330) In md-on-ssd phase 2, the scm_sz (VOS file size) could be smaller than the meta_sz (meta blob size), then we need to store an extra scm_sz in SMD, so that on engine start, this scm_sz could be retrieved from SMD for VOS file re-creation. To make the SMD compatible with pmem & md-on-ssd phase 1, a new table named "meta_pool_ex" is introduced for storing scm_sz. * DAOS-14422 control: Update pool create UX for MD-on-SSD phase2 (#14740) Show MD-on-SSD specific output on pool create and add new syntax to specify ratio between SSD capacity reserved for MD in new DAOS pool and the (static) size of memory reserved for MD in the form of VOS index files (previously held on SCM but now in tmpfs on ramdisk). Memory-file size is now printed when creating a pool in MD-on--SSD mode. The new --{meta,data}-size params can be specified in decimal or binary units e.g. GB or GiB and refer to per-rank allocations. These manual size parameters are only for advanced use cases and in most situations the --size (X%|XTB|XTiB) syntax is recommended when creating a pool. --meta-size param is bytes to use for metadata on SSD and --data-size is for data on SSD (similar to --nvme-size). The new --mem-ratio param is specified as a percentage with up to two decimal places precision. This defines the proportion of the metadata capacity reserved on SSD (i.e. --meta-size) that will be used when allocating the VOS-index (one blob and one memory file per target). Enable MD-on-SSD phase2 pool creation requires envar DAOS_MD_ON_SSD_MODE=3 to be set in server config file. * DAOS-14317 vos: initial changes for the phase2 object pre-load (#15001) - Introduced new durable format 'vos_obj_p2_df' for the md-on-ssd phase2 object, at most 4 evict-able bucket IDs could be stored. - Changed vos_obj_hold() & vos_obj_release() to pin or unpin object respectively. - Changed the private data of VOS dkey/akey/value trees from 'vos_pool' to 'vos_object', the private data will be used for allocating/reserving from the evict-able bucket. - Move the vos_obj_hold() call from vos_update_end() to vos_update_begin() for the phase2 pool, reserve value from the object evict-able bucket. * DAOS-14316 vos: object preload for GC (#15059) - Use the reserved vos_gc_item.it_args to store 2 bucket IDs for GC_OBJ, GC_DKEY and GC_AKEY, so that GC drain will be able to tell the what buckets need be pinned by looking up bucket numbers stored in vos_obj_df. - Once GC drain needs to pin a different bucket, it will have to commit current tx; unpin current bucket; pin required bucket; start new tx; - Forge a dummy object as the private data for the btree opened by GC, so that the 'ti_destroy' hack could be removed. - Store evict-able bucket ID persistently for newly created object, this was missed in prior PR. * DAOS-14315 vos: Pin objects for DTX commit & CPD RPC (#15118) Introduced two new VOS APIs vos_pin_objects() & vos_unpin_objects() for pin or unpin objects. Changed DTX commit/abort & CPD RPC handler code to ensure objects pinned before starting local transaction. - Bug fix in vos_pmemobj_create(), the actual scm_size should be passed to bio_mc_create(). - Use vos_obj_acquire() instead of vos_obj_hold() in vos_update_begin() to avoid the complication of object ilog adding in ts_set. We could simplify it in future cleanup PRs. - Handle concurrent object bucket alloting & loading. * DAOS-16160 control: Update pool create --size % opt for MD-on-SSD p2 (#14957) Update calculation of usable pool META and DATA component sizes for MD-on-SSD phase-2 mode; when meta-blob-size > vos-file-size. - Use mem-ratio when making NVMe size adjustments to calculate usable pool capacity from raw stats. - Use mem-ratio when auto-sizing to determine META component from percentage of usable rank-RAM-disk capacity. - Apportion cluster count reductions to SSDs based on number of assigned targets to take account of target striping across a tier. - Fix pool query ftest. - Improve test coverage for meta and rdb size calculations. * DAOS-16763 common: Tunable to control max NEMB (#15422) A new tunable, DAOS_MD_ON_SSD_NEMB_PCT is introuced, to define the percentage of memory cache that non-evictable memory buckets can expand to. This tunable will be read during pool creation and persisted, ensuring that each time the pool is reopened, it retains the value set during its creation. Signed-off-by: Niu Yawei <yawei.niu@intel.com> Signed-off-by: Tom Nabarro <tom.nabarro@intel.com> Signed-off-by: Sherin T George <sherin-t.george@hpe.com> Co-authored-by: Tom Nabarro <tom.nabarro@intel.com> Co-authored-by: sherintg <sherin-t.george@hpe.com>
Update calculation of usable pool META and DATA component sizes for
MD-on-SSD phase-2 mode; when meta-blob-size > vos-file-size.
pool capacity from raw stats.
percentage of usable rank-RAM-disk capacity.
assigned targets to take account of target striping across a tier.
Required-githooks: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: