DAOS-16591 mgmt, vos, common: Align scm/meta size #15146

sherintg · 2024-09-18T07:07:21Z

The scm and meta sizes for vos are now aligned-up to 16M for pools using phase 2 allocator.

Before requesting gatekeeper:

Two review approvals and any prior change requests have been resolved.
Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
Commit messages follows the guidelines outlined here.
Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

github-actions · 2024-09-18T07:07:43Z

Ticket title is 'The scm and meta size should be aligned up to the page size of BMEM allocator'
Status is 'Open'
Labels: 'md_on_ssd2'
Errors are component not formatted correctly
https://daosio.atlassian.net/browse/DAOS-16591

The scm and meta sizes for vos are now aligned-up to 16M for pools using phase 2 allocator. Signed-off-by: Sherin T George <sherin-t.george@hpe.com>

wangshilong · 2024-09-18T08:48:51Z

src/vos/vos_pool.c

+
+ backend = umempobj_get_backend_type();
+ if ((*scm_sz != *meta_sz) && (backend == DAOS_MD_BMEM))
+ backend = DAOS_MD_BMEM_V2;


i think we should return failure in this case?

Why, this seems valid to me. It is the standard case for phase 2 where mem_ratio is not 100% and therefore we need to use V2.

i thought that if users configured backend as DAOS_MD_BMEM using environment DAOS_MD_ON_SSD_MODE. I would think we don't support phase2 in this case? As phase1 and Phase2 will be totally different layout.

@wangshilong see vos_pool_create() in vos_on_blob_p2 branch, now DAOS_MD_BMEM is the default backend, we use BMEM_V2 backend only when "meta size > scm size" or the DAOS_MD_ON_SSD_MODE is explicitly specified to BMEM_V2 by user.

@sherintg I think we need to make above code into a function to avoid the dup code in vos_pool_create(), so that when we change our backend selection strategy in the future, we don't have to update code here and there.

Addressed in the latest commit. Also, added a new tunable DAOS_MD_DISABLE_BMEM_V2 to disable creation of v2 pools.

wangshilong

Anyway, just confirming the question, no objections to the PR.

daosbuild1 · 2024-09-18T19:42:20Z

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15146/2/testReport/

NiuYawei · 2024-09-19T00:55:45Z

src/vos/vos_pool.c

+
+ backend = umempobj_get_backend_type();
+ if ((*scm_sz != *meta_sz) && (backend == DAOS_MD_BMEM))
+ backend = DAOS_MD_BMEM_V2;


@wangshilong see vos_pool_create() in vos_on_blob_p2 branch, now DAOS_MD_BMEM is the default backend, we use BMEM_V2 backend only when "meta size > scm size" or the DAOS_MD_ON_SSD_MODE is explicitly specified to BMEM_V2 by user.

@sherintg I think we need to make above code into a function to avoid the dup code in vos_pool_create(), so that when we change our backend selection strategy in the future, we don't have to update code here and there.

NiuYawei · 2024-09-19T00:56:56Z

src/vos/vos_pool.c

+ size_t alignsz;
+
+ backend = umempobj_get_backend_type();
+ if ((*scm_sz != *meta_sz) && (backend == DAOS_MD_BMEM))


scm_sz > meta_sz is invalid, we should assert or return error. BTW, do we guarantee the size passed from control plane is always non-zero?

meta and scm sz should always be non-zero as per src/mgmt/srv_drpc.c L504

NiuYawei · 2024-09-19T01:01:00Z

src/vos/vos_pool.c

+ alignsz = umempobj_pgsz(backend);
+
+ *scm_sz = D_ALIGNUP(*scm_sz, alignsz);
+ if (*meta_sz)


Looks like control plane may pass zero meta_sz? Then we could mistakenly assume it's BMEM_V2 backend in above code?

BTW, I think we'd move the "max(tca->tca_scm_size / dss_tgt_nr, 1 << 24)" from tgt_create_preallocate() to here and apply it to both scm size and meta size, right?

meta_sz check was added assuming that for pmem mode this may not be set. Verified that that both scm_sz and meta_sz is set by the control plane in all modes pmem, bmem_v1 and bmem_v2. Hence removed the checks and added asserts.

Addressed review comments. Added a tunable to disable creating BMEM_V2 pools. Signed-off-by: Sherin T George <sherin-t.george@hpe.com>

tanabarr · 2024-09-20T09:09:25Z

@sherintg please try to avoid force push if you can because it forces reviewers to go through the whole patch again not just the differential. TIA

tanabarr · 2024-09-20T09:15:40Z

src/mgmt/srv_target.c

- rc = tgt_vos_preallocate_sequential(tca->tca_ptrec->dptr_uuid,
- max(tca->tca_scm_size / dss_tgt_nr,
- 1 << 24), dss_tgt_nr);
+ rc = tgt_vos_preallocate_sequential(


Why do we have to allocate sequentially in MD mode rather than parallel like in PMem?

Its the other way round, its sequential in PMEM and parallel in MD. This change was done in phase 1 to overcome the overhead of fallocate on tmpfs.

tanabarr · 2024-09-20T09:16:23Z

src/mgmt/srv_target.c

@@ -1123,7 +1122,12 @@ ds_mgmt_hdlr_tgt_create(crt_rpc_t *tc_req)

 tgt_scm_sz = tc_in->tc_scm_size / dss_tgt_nr;
 tgt_meta_sz = tc_in->tc_meta_size / dss_tgt_nr;
- vos_pool_roundup_size(&tgt_scm_sz, &tgt_meta_sz);
+ rc = vos_pool_roundup_size(&tgt_scm_sz, &tgt_meta_sz);


Is this strange formatting enforced by the clang linter?

sherintg · 2024-09-20T09:37:27Z

@tanabarr Apologies for doing the force push against second commit. I did a second commit yesterday addressing the review comments. However, post md_on_ssd meeting I modified this commit (with a force push) to include a tunable to disable phase 2 allocator. The second commit should still contain the difference of what you had reviewed and approved and the changes done as part of rework.

daosbuild1 · 2024-09-20T14:01:56Z

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15146/4/execution/node/1463/log

daosbuild1 · 2024-09-20T15:28:08Z

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15146/4/execution/node/1447/log

NiuYawei · 2024-09-23T01:37:23Z

src/vos/vos_pool.c

+ size_t alignsz;
+ int rc;
+
+ D_ASSERT((*scm_sz != 0) && (*meta_sz != 0));


This assert is triggered in CI tests, looks we pass 0 meta_size sometimes (for pmem or phase1 mode).

Addressed with the 3rd commit.

From https://build.hpdd.intel.com/job/daos-stack/job/daos/view/change-requests/job/PR-15146/4/artifact/Functional%20Hardware%20Medium/daos_test/rebuild.py/daos_logs.wolf-254/01-DAOS_Rebuild_0to10_daos_control.log/ failed we can see that the assert happens before pool create, just after engine start (before any dmg pool commands are issued).

Wolf-254 engine 1 starts - rank 2

Wolf-254 engine 0 starts - rank 5

SmdQuery issued from dmg to wolf-254 for ranks 2,5

ranks 2&5 fail due to assert
e.g. 13:06:12 INFO | Sep 20 13:03:37 wolf-254.wolf.hpdd.intel.com daos_server[167095]: ERROR: daos_engine:0 09/20-13:03:37.29 wolf-254 DAOS[168521/0/4058] vos EMRG src/vos/vos_pool.c:787 vos_pool_roundup_size() Assertion '(*scm_sz != 0) && (*meta_sz != 0)' failed

So whilst meta_sz will be set nonzero in control-plane pool create calls, other engine internal calls need to be handled for meta_sz == 0.

Handled meta_sz == 0 during pool extend operation. Signed-off-by: Sherin T George <sherin-t.george@hpe.com>

DAOS-16591 mgmt, vos, common: Align scm/meta size

08e761b

The scm and meta sizes for vos are now aligned-up to 16M for pools using phase 2 allocator. Signed-off-by: Sherin T George <sherin-t.george@hpe.com>

sherintg force-pushed the sherintg/vos_on_blob_p2/DAOS-16591 branch from 488082a to 08e761b Compare September 18, 2024 07:20

sherintg marked this pull request as ready for review September 18, 2024 08:36

sherintg requested review from a team as code owners September 18, 2024 08:36

sherintg requested review from tanabarr, wangshilong and NiuYawei September 18, 2024 08:36

wangshilong reviewed Sep 18, 2024

View reviewed changes

tanabarr approved these changes Sep 18, 2024

View reviewed changes

wangshilong reviewed Sep 18, 2024

View reviewed changes

wangshilong approved these changes Sep 18, 2024

View reviewed changes

NiuYawei requested changes Sep 19, 2024

View reviewed changes

DAOS-16591 mgmt, vos, common: Align scm/meta size

35ae67a

Addressed review comments. Added a tunable to disable creating BMEM_V2 pools. Signed-off-by: Sherin T George <sherin-t.george@hpe.com>

sherintg force-pushed the sherintg/vos_on_blob_p2/DAOS-16591 branch from b1bb4cf to 35ae67a Compare September 20, 2024 06:08

sherintg requested review from tanabarr, NiuYawei and wangshilong September 20, 2024 06:12

tanabarr approved these changes Sep 20, 2024

View reviewed changes

NiuYawei approved these changes Sep 20, 2024

View reviewed changes

NiuYawei requested changes Sep 23, 2024

View reviewed changes

DAOS-16591 mgmt, vos, common: Align scm/meta size

a4bfb57

Handled meta_sz == 0 during pool extend operation. Signed-off-by: Sherin T George <sherin-t.george@hpe.com>

NiuYawei self-requested a review September 24, 2024 05:52

NiuYawei approved these changes Sep 24, 2024

View reviewed changes

NiuYawei merged commit af71f88 into feature/vos_on_blob_p2 Sep 24, 2024
49 of 53 checks passed

NiuYawei deleted the sherintg/vos_on_blob_p2/DAOS-16591 branch September 24, 2024 05:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAOS-16591 mgmt, vos, common: Align scm/meta size #15146

DAOS-16591 mgmt, vos, common: Align scm/meta size #15146

sherintg commented Sep 18, 2024

github-actions bot commented Sep 18, 2024 •

edited

Loading

wangshilong Sep 18, 2024

tanabarr Sep 18, 2024

wangshilong Sep 18, 2024

NiuYawei Sep 19, 2024

sherintg Sep 20, 2024

wangshilong left a comment

daosbuild1 commented Sep 18, 2024

NiuYawei Sep 19, 2024

NiuYawei Sep 19, 2024

sherintg Sep 20, 2024

tanabarr Sep 20, 2024

NiuYawei Sep 19, 2024

sherintg Sep 20, 2024

tanabarr commented Sep 20, 2024

tanabarr Sep 20, 2024

sherintg Sep 20, 2024 •

edited

Loading

tanabarr Sep 20, 2024

sherintg Sep 20, 2024

sherintg commented Sep 20, 2024

daosbuild1 commented Sep 20, 2024

daosbuild1 commented Sep 20, 2024

NiuYawei Sep 23, 2024

sherintg Sep 23, 2024

tanabarr Sep 23, 2024

DAOS-16591 mgmt, vos, common: Align scm/meta size #15146

DAOS-16591 mgmt, vos, common: Align scm/meta size #15146

Conversation

sherintg commented Sep 18, 2024

Before requesting gatekeeper:

Gatekeeper:

github-actions bot commented Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangshilong left a comment

Choose a reason for hiding this comment

daosbuild1 commented Sep 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tanabarr commented Sep 20, 2024

Choose a reason for hiding this comment

sherintg Sep 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sherintg commented Sep 20, 2024

daosbuild1 commented Sep 20, 2024

daosbuild1 commented Sep 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Sep 18, 2024 •

edited

Loading

sherintg Sep 20, 2024 •

edited

Loading