Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16591 mgmt, vos, common: Align scm/meta size #15146

Merged
merged 3 commits into from
Sep 24, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions src/common/dav_v2/dav_iface.c
Original file line number Diff line number Diff line change
Expand Up @@ -470,3 +470,9 @@ dav_class_register_v2(dav_obj_t *pop, struct dav_alloc_class_desc *p)

return 0;
}

DAV_FUNC_EXPORT size_t
dav_obj_pgsz_v2()
{
return ZONE_MAX_SIZE;
}
6 changes: 6 additions & 0 deletions src/common/dav_v2/dav_v2.h
Original file line number Diff line number Diff line change
Expand Up @@ -313,4 +313,10 @@ dav_get_heap_mb_stats_v2(dav_obj_t *pop, uint32_t mb_id, struct dav_heap_mb_stat
uint32_t
dav_allot_mb_evictable_v2(dav_obj_t *pop, int flags);

/*
* Return the page size for dav_v2.
*/
size_t
dav_obj_pgsz_v2();

#endif /* __DAOS_COMMON_DAV_V2_H */
15 changes: 13 additions & 2 deletions src/common/mem.c
Original file line number Diff line number Diff line change
Expand Up @@ -85,12 +85,14 @@
return 0;
}

int umempobj_get_backend_type(void)
int
umempobj_get_backend_type(void)
{
return daos_md_backend;
}

int umempobj_backend_type2class_id(int backend)
int
umempobj_backend_type2class_id(int backend)
{
switch (backend) {
case DAOS_MD_PMEM:
Expand All @@ -108,6 +110,15 @@
}
}

size_t
umempobj_pgsz(int backend)
{
if (backend == DAOS_MD_BMEM_V2)
return dav_obj_pgsz_v2();
else
return (1UL << 12);
}

/** Define common slabs. We can refine this for 2.4 pools but that is for next patch */
static const int slab_map[] = {
0, /* 32 bytes */
Expand Down Expand Up @@ -3035,7 +3046,7 @@
VALGRIND_ENABLE_ADDR_ERROR_REPORTING_IN_RANGE((char *)pinfo->pi_addr, len);
pinfo->pi_io = 0;
if (rc) {
DL_ERROR(rc, "Read MD blob failed.\n");

Check warning on line 3049 in src/common/mem.c

View workflow job for this annotation

GitHub Actions / Logging macro checking

check-return, Line contains too many newlines
page_wakeup_io(cache, pinfo);
return rc;
} else if (cache->ca_evtcb_fn) {
Expand Down Expand Up @@ -3177,7 +3188,7 @@
if (is_page_dirty(pinfo)) {
rc = cache_flush_page(cache, pinfo);
if (rc) {
DL_ERROR(rc, "Flush page failed.\n");

Check warning on line 3191 in src/common/mem.c

View workflow job for this annotation

GitHub Actions / Logging macro checking

check-return, Line contains too many newlines
return rc;
}

Expand Down Expand Up @@ -3237,7 +3248,7 @@
while (need_evict(cache)) {
rc = cache_evict_page(cache, for_sys);
if (rc && rc != -DER_AGAIN && rc != -DER_BUSY) {
DL_ERROR(rc, "Evict page failed.\n");

Check warning on line 3251 in src/common/mem.c

View workflow job for this annotation

GitHub Actions / Logging macro checking

check-return, Line contains too many newlines
return rc;
}

Expand Down Expand Up @@ -3288,7 +3299,7 @@
if (is_id_evictable(cache, pg_id)) {
rc = cache_get_free_page(cache, &pinfo, 0, false);
if (rc) {
DL_ERROR(rc, "Failed to get free page.\n");

Check warning on line 3302 in src/common/mem.c

View workflow job for this annotation

GitHub Actions / Logging macro checking

check-return, Line contains too many newlines
break;
}
} else {
Expand Down Expand Up @@ -3471,7 +3482,7 @@

rc = cache_map_pages(cache, out_pages, page_nr);
if (rc)
DL_ERROR(rc, "Map page failed.\n");

Check warning on line 3485 in src/common/mem.c

View workflow job for this annotation

GitHub Actions / Logging macro checking

check-return, Line contains too many newlines

if (out_pages != &in_pages[0])
D_FREE(out_pages);
Expand All @@ -3494,7 +3505,7 @@

rc = cache_pin_pages(cache, out_pages, page_nr, for_sys);
if (rc) {
DL_ERROR(rc, "Load page failed.\n");

Check warning on line 3508 in src/common/mem.c

View workflow job for this annotation

GitHub Actions / Logging macro checking

check-return, Line contains too many newlines
} else {
for (i = 0; i < page_nr; i++) {
uint32_t pg_id = out_pages[i];
Expand Down Expand Up @@ -3531,7 +3542,7 @@

rc = cache_pin_pages(cache, out_pages, page_nr, for_sys);
if (rc) {
DL_ERROR(rc, "Load page failed.\n");

Check warning on line 3545 in src/common/mem.c

View workflow job for this annotation

GitHub Actions / Logging macro checking

check-return, Line contains too many newlines
goto out;
}

Expand Down Expand Up @@ -3594,7 +3605,7 @@
while (need_reserve(cache, 0)) {
rc = cache_evict_page(cache, false);
if (rc && rc != -DER_AGAIN && rc != -DER_BUSY) {
DL_ERROR(rc, "Evict page failed.\n");

Check warning on line 3608 in src/common/mem.c

View workflow job for this annotation

GitHub Actions / Logging macro checking

check-return, Line contains too many newlines
break;
}

Expand Down
4 changes: 4 additions & 0 deletions src/include/daos/mem.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ int umempobj_settings_init(bool md_on_ssd);
/* convert backend type to umem class id */
int umempobj_backend_type2class_id(int backend);

/* get page size for the backend */
size_t
umempobj_pgsz(int backend);

/* umem persistent object property flags */
#define UMEMPOBJ_ENABLE_STATS 0x1

Expand Down
10 changes: 10 additions & 0 deletions src/include/daos_srv/vos.h
Original file line number Diff line number Diff line change
Expand Up @@ -519,6 +519,16 @@ int
vos_aggregate(daos_handle_t coh, daos_epoch_range_t *epr,
int (*yield_func)(void *arg), void *yield_arg, uint32_t flags);

/**
* Round up the scm and meta sizes to match the backend requirement.
* \param[in/out] scm_sz SCM size that needs to be aligned up
* \param[in/out] meta_sz META size that needs to be aligned up
*
* \return 0 on success, error otherwise.
*/
int
vos_pool_roundup_size(size_t *scm_sz, size_t *meta_sz);

/**
* Discards changes in all epochs with the epoch range \a epr
*
Expand Down
2 changes: 1 addition & 1 deletion src/mgmt/srv_drpc.c
Original file line number Diff line number Diff line change
Expand Up @@ -503,7 +503,7 @@ ds_mgmt_drpc_pool_create(Drpc__Call *drpc_req, Drpc__Response *drpc_resp)

scm_size = req->tier_bytes[DAOS_MEDIA_SCM];
if (req->mem_ratio)
scm_size *= req->mem_ratio;
scm_size *= (double)req->mem_ratio;

rc = ds_mgmt_create_pool(pool_uuid, req->sys, "pmem", targets, scm_size,
req->tier_bytes[DAOS_MEDIA_NVME], prop, &svc, req->n_fault_domains,
Expand Down
8 changes: 8 additions & 0 deletions src/mgmt/srv_target.c
Original file line number Diff line number Diff line change
Expand Up @@ -1083,6 +1083,8 @@ ds_mgmt_hdlr_tgt_create(crt_rpc_t *tc_req)
pthread_t thread;
bool canceled_thread = false;
int rc = 0;
size_t tgt_scm_sz;
size_t tgt_meta_sz;

/** incoming request buffer */
tc_in = crt_req_get(tc_req);
Expand Down Expand Up @@ -1119,6 +1121,12 @@ ds_mgmt_hdlr_tgt_create(crt_rpc_t *tc_req)
D_DEBUG(DB_MGMT, DF_UUID": record inserted to dpt_creates_ht\n",
DP_UUID(tca.tca_ptrec->dptr_uuid));

tgt_scm_sz = tc_in->tc_scm_size / dss_tgt_nr;
tgt_meta_sz = tc_in->tc_meta_size / dss_tgt_nr;
vos_pool_roundup_size(&tgt_scm_sz, &tgt_meta_sz);
tc_in->tc_scm_size = tgt_scm_sz * dss_tgt_nr;
tc_in->tc_meta_size = tgt_meta_sz * dss_tgt_nr;

tca.tca_scm_size = tc_in->tc_scm_size;
tca.tca_nvme_size = tc_in->tc_nvme_size;
tca.tca_dx = dss_current_xstream();
Expand Down
20 changes: 20 additions & 0 deletions src/vos/vos_pool.c
Original file line number Diff line number Diff line change
Expand Up @@ -1272,6 +1272,26 @@ vos_pool_create_ex(const char *path, uuid_t uuid, daos_size_t scm_sz, daos_size_
return rc;
}

int
vos_pool_roundup_size(daos_size_t *scm_sz, daos_size_t *meta_sz)
{
int backend;
size_t alignsz;

backend = umempobj_get_backend_type();
if ((*scm_sz != *meta_sz) && (backend == DAOS_MD_BMEM))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scm_sz > meta_sz is invalid, we should assert or return error. BTW, do we guarantee the size passed from control plane is always non-zero?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta and scm sz should always be non-zero as per src/mgmt/srv_drpc.c L504

backend = DAOS_MD_BMEM_V2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should return failure in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why, this seems valid to me. It is the standard case for phase 2 where mem_ratio is not 100% and therefore we need to use V2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought that if users configured backend as DAOS_MD_BMEM using environment DAOS_MD_ON_SSD_MODE. I would think we don't support phase2 in this case? As phase1 and Phase2 will be totally different layout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangshilong see vos_pool_create() in vos_on_blob_p2 branch, now DAOS_MD_BMEM is the default backend, we use BMEM_V2 backend only when "meta size > scm size" or the DAOS_MD_ON_SSD_MODE is explicitly specified to BMEM_V2 by user.

@sherintg I think we need to make above code into a function to avoid the dup code in vos_pool_create(), so that when we change our backend selection strategy in the future, we don't have to update code here and there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest commit. Also, added a new tunable DAOS_MD_DISABLE_BMEM_V2 to disable creation of v2 pools.


/* Round up the size such that it is compatible with backend */
alignsz = umempobj_pgsz(backend);

*scm_sz = D_ALIGNUP(*scm_sz, alignsz);
if (*meta_sz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like control plane may pass zero meta_sz? Then we could mistakenly assume it's BMEM_V2 backend in above code?

BTW, I think we'd move the "max(tca->tca_scm_size / dss_tgt_nr, 1 << 24)" from tgt_create_preallocate() to here and apply it to both scm size and meta size, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta_sz check was added assuming that for pmem mode this may not be set. Verified that that both scm_sz and meta_sz is set by the control plane in all modes pmem, bmem_v1 and bmem_v2. Hence removed the checks and added asserts.

*meta_sz = D_ALIGNUP(*meta_sz, alignsz);

return 0;
}

int
vos_pool_create(const char *path, uuid_t uuid, daos_size_t scm_sz, daos_size_t data_sz,
daos_size_t meta_sz, unsigned int flags, uint32_t version, daos_handle_t *poh)
Expand Down
Loading