Test merge from release/2.4 [DO NOT MERGE] #14355

mlawsonca · 2024-05-13T20:34:23Z

Run-GHA: true

…iptor libfuse supports opening /dev/fuse and passing the file descriptor as the mountpoint. In some cases, realpath may not work for these file descriptors, and so we should ignore ENOENT errors and instead check that we can get file descriptor attributes from the given path. Change-Id: I2e9aad0e11a4c6f27ec2c4b1aeb75fc651d2540d

The setuid, setgid, and sticky bit can cause fatal errors when the datamover tool sets file permissions after copying a file, since these are not supported by DFS. We can just ignore this bit when calling dfs_chmod. Change-Id: Ibf2b6d793f95dd59c902c8d847bc087fb479c5ea

In order to prevent known race to occur due to lack of locking in Glibc environment APIs (getenv()/[uns]setenv()/ putenv()/clearenv()), they have been overloaded and strengthened in Gurt with hooks now all using a common lock/mutex. Libgurt is the preferred place for this as it is the lowest layer in DAOS, so it will be the earliest to be loaded and will ensure the hook to be installed as early as possible and could prevent usage of LD_PRELOAD. This will address the main lack of multi-thread protection in the Glibc APIs but do not handle all unsafe use-cases (like the change/removal of an env var when its value address has already been grabbed by a previous getenv(), ...). Change-Id: I38cda09746ddb4e79f0297fee26c2a22e1cb881b Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>

Change-Id: Ic0eeee9df2f0ef29f3f3f047080fdce109af71bf

TESTED=https://paste.googleplex.com/6208972604833792 BUG=311738671 Change-Id: Ia6658d7c99c8d21c35d724b86fa2c1c48b41069f

The upstream 2.4 release has support for storing engine metadata outside of tmpfs, but it is tied to the new MD-on-SSD feature preview. With some small adjustments to the code, we can enable external metadata without MD-on-SSD. Required-githooks: true Change-Id: If3e728a2db7a4994572bbe53c92654f2e9b01ee0 Signed-off-by: Michael MacDonald <mjmac@google.com>

- D_QUOTA_RPCS envariable added. When set, limits the number of RPCs on a wire being sent out by the process. - RPCs that exceed quota limit (if set), will now be queued by the sender - Quota support code added to handle and track resources Required-githooks: true Signed-off-by: Alexander A Oganezov <alexander.a.oganezov@intel.com>

Adds some cart-level metrics for RPC quota exceeded and RPC queue depth. Required-githooks: true Change-Id: I5760c255e13ca9a70d352017cae2f6bcee5a6959 Signed-off-by: Michael MacDonald <mjmac@google.com>

Matches new default in 2.6+; aligns default value with standard tuning practices. Required-githooks: true Change-Id: I817927a160fc3dbb2c60a12107da668147e78706 Signed-off-by: Michael MacDonald <mjmac@google.com>

It should be part of server build, not tests Required-githooks: true Change-Id: I28b537e1ea7c32a323036c3ec935517ec97ad80c Signed-off-by: Jeff Olivier <jeffolivier@google.com>

This PR is a subset of the PR #13250 allowing thread safe management of environment variables: it has been split into smaller PRs to facilitate the review process. This PR mainly add thread safe environment variables management functions. It also remove and replace old non thread safe custom environment management functions. Finally, it replace the setenv() function with d_setenv(). Required-githooks: true Change-Id: Ife6690e2c63dd6c47279a2ac8c3c5a3da5cf8213 Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@intel.com>

Fix regression of d_getenv_xxx() functions used for retrieve int envioronment variable: support of string reprsenting signed integer. Required-githooks: true Change-Id: I7a7f84fe17378ffca1cc0179e1119c1f17a3c4da Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@intel.com>

Replace getenv() function with d_agetenv_str() and d_freeenv_str() Required-githooks: true Change-Id: I6a3e3fafc82327c091bfe96bea3e5f0ef5bece48 Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@intel.com>

Required-githooks: true Change-Id: I886d130eb20194a1870579bd47ade2b6e4b3b35a Signed-off-by: Jeff Olivier <jeffolivier@google.com>

…13053) Allow metadata caching even when the file is open. This was initially disabled due to conflicts with the interception library however dfuse now tracks interception library use so it's possible to only disable when the interception library is in-use rather than all the time. Required-githooks: true Change-Id: Ida03a854030f6b9ded24c5465e0f1126fcba310e Signed-off-by: Ashley Pittman ashley.m.pittman@intel.com

fuse will call this often to read non-existent xattrs for every write request so short-circuit these to avoid server round-trips. Required-githooks: true Change-Id: I3337b1724f237cc50a5a537e0844f05f0ed9cc61 Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

* DAOS-14981 gurt: restore d_getenv_int undefined symbol Restore missing plain function d_getenv_int() to fix missing symbol with libdaos. Required-githooks: true Change-Id: I86d5c2f5d4d8bbd3c4ab3fdef70ffc5b41ce0921 Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@intel.com>

Change-Id: I6bf8765142024e3fd404d51f186c830e8af4bca5

getlogin does not work on the GKE pods that host our presubmits. BUG=318885377 Change-Id: If4175d8a19b0174d489754659f34d4237cab6e97

Add a STATIC_FUSE option, default is off. When enabled DAOS will link statically with the fuse library. Also add developer build. This needs some work on the libfuse RPM side. Change-Id: I976f135af29d4e3da61cad9129ee19cbb419cddb Signed-off-by: Jeff Olivier <jeffolivier@google.com>

This ensures the dfuse we ship uses the version of libfuse we want. Required-githooks: true Change-Id: I5aca28fdcb0e678fbd19df94cbf7428f5b9d61d2 Signed-off-by: Jeff Olivier <jeffolivier@google.com>

1. target count calculation should not use pool_tree_count, which might count the target count under other domain, thus corrupt the pool map during extending. 2. return correct error code in migrate_pool_tls_lookup_create() and mrone_one_fetch. 3. Missing free in regenerate_task_of_type. Signed-off-by: Di Wang <di.wang@intel.com>

Adds a gauge to measure SWIM delay and a counter for glitches (temporary network outages). Change-Id: Ibd85c08ab3e3a38931d795d62270f3e4059d7c67 Required-githooks: true Change-Id: I854937dd249ad9f7211a3b7d40d3365a3e2f79f2 Signed-off-by: Michael MacDonald <mjmac@google.com>

During migration, it should choose the minimum epoch from rebuild stable epoch and EC aggregation boundary to make sure correct data is being fetched during recovery. Add tests to verify the process. Signed-off-by: Di Wang <di.wang@intel.com>

Use stable epoch for partial parity update to make sure these partial updates are not below stable epoch boundary, otherwise both EC and VOS aggregation might operate on the same recxs at the same time, which can corrupt the data during rebuild. During EC aggregation, it should consider the un-aggregate epoch on non-leader parity as well, otherwise if the leader parity failed, which will be excluded from global EC stable epoch calculation immediately, then before the leader parity is being rebuilt, the global stable epoch might pass the un-aggregated epoch on the failed target, then these partial update on the data shard might be aggregated before EC aggregation, which might cause data corruption. And also it should choose a less fseq shard among all parity shards as the aggregate leader, in case the last parity can not be rebuilt in time. Signed-off-by: Di Wang <di.wang@intel.com>

Add missing properties to the check (for testing purpose) in ds_pool_query_handler. Add missing DAOS_FAIL_ALWAYS to POOL10. Clear fail_loc in the MGMT and POOL tests even if DAOS_FAIL_ONCE has been requested. Other fail_loc-using tests will be cleaned up later. Change-Id: Ied6c248763ec60fc722a1c636bad08ffff0cc58c Signed-off-by: Li Wei <wei.g.li@intel.com>

Fix and clean up fail_loc usage in daos_test CONTAINER tests. Also, fix bugs revealed by the fixed tests: - cont_iv_prop_l2g should set DAOS_CO_QUERY_PROP_SCRUB_DIS for DAOS_PROP_CO_SCRUBBER_DISABLED. - CONT_ACL_UPDATE should update the IV. Change-Id: I1fa3a25d8283c9e5ef0b7ddaa76febd29b100cfb Signed-off-by: Li Wei <wei.g.li@intel.com>

Correct some doxygen style formatting that was not valid doxygen. Change-Id: If332fc006b7ed615903a19f1ee59337322a406c0 Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

Check RF and other performance before retry check, so non-allowed write should return failuer immediately, instead of retrying endless. Use rebuild/reintegrate_pool_rank in daos_container test to avoid DER_BUSY failure. Change-Id: I421defce185a928ebd3e52f59f1b19247d90f420 Signed-off-by: Di Wang <di.wang@intel.com>

* DAOS-14010 rebuild: add delay rebuild Add "delay rebuild" healing mode, so the delay rebuild process is 1) SWIM detects dead ranks and report to the PS leader, which update the pool map, i.e. marking the related targets as DOWN. 2) Though the rebuild job will not be scheduled, until there are further manual pool operations, for example drain, extend, reintegration. 3) Then all these pool operations will be merged into one rebuild job, then scheduled. Update placement algothrim to be able to calculate the layout with merged pool operation. Abort the rebuild job immediately if it finds further pool map update, so the current job will be merged to the following rebuild job. So concurrent pool operation will be allowed, no EBUSY check anymore. Add various tests to verify the delay rebuild process. Change-Id: If6f163345938bb7e1ee7550124770babd815c695 Signed-off-by: Di Wang <di.wang@intel.com>

Fix a few typo for delay rebuild. Change-Id: I9db5c2de7e2773da9dd0cc631f13ebec12fbb6c0 Signed-off-by: Di Wang <di.wang@intel.com>

Address CVEs found in these dependencies by updating to the latest released versions. Change-Id: I032403700d6ebb43ba6be519bf0d82cc5eb1ebfb

…ner (#13807) - add new public function for dfs to set-owner - add an NLT test for it TESTED=https://paste.googleplex.com/6210316960006144 BUG=311736144 Required-githooks: true Change-Id: I9191b09219fbd58de60b75a36eec2f51a2766260 Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com>

Test the provider we are deploying with. Also fixes NLT after landing the cont chown patch. Required-githooks: true Change-Id: I3371be152d509cf1bb5f94cf85cc27b95fb108be Signed-off-by: Michael MacDonald <mjmac@google.com>

Seeking to SEEK_END is not impolemented in libioil. It causes interception to be disabled with some python framework. Change-Id: I362d5d1d61449e7b03b2af21460512143547f99d Signed-off-by: Johann Lombardi <johann.lombardi@gmail.com>

Change-Id: Ia4d10688686d992a706da725f7d15db45a418531 Signed-off-by: Jeff Olivier <jeffolivier@google.com>

* DAOS-14845 object: retry migration for retriable failure To avoid retry rebuild and reclaim, let's retry rebuild until further pool map changes, in that case, it should fail the current rebuild, and further rebuild will resolve the failure. various fixs about rebuild if PS leader keeps changing during rebuild. Move migrate max ULT control to migrate_obj_iter_cb() to make sure max ULT count will not exceed the setting. Change the yield freq from 128 to 16 to make sure the object Optimize migrate memory usage - Add max ULT control for all targets on xstream, so the object being migrated can not exceed MIGRATE_MAX_ULT. - Add each target max ULT control, so each target migrate ULT can not exceed MIGRATE_MAX_ULT/dss_tgt_nr. - Add migrate_cont_open to avoid dsc_cont_open and dsc_pool_open for each object and dkey migration. Change-Id: I3b426542f6a5b196fc0e7cabb680d4ff9b1db65c Signed-off-by: Di Wang <di.wang@intel.com>

When using server target, daos_metrics wasn't built because it was buried under a check for client target. I really need to figure out a better way to specify targets but this will fix the immediate issue. Change-Id: Ifa3e49e42ad95fb96f246e723a5e4ec77f10e4d9 Signed-off-by: Jeff Olivier <jeffolivier@google.com>

Allow the suid and sgid bits to be stored in dfs_osetattr. Even if libdfs does not support those bits, it allows dfuse to support them via the kernel. The lack of sgid support cause spack to fail over dfuse as reported in the jira ticket. Change-Id: I76b41d9b231fa2b7f1d434d6ae06e6252cadc2b4 Signed-off-by: Johann Lombardi <johann.lombardi@gmail.com>

disable CODEOWNERS for google branch disable upstream hardware tests on branch by default remove bad merge block fix ordering of imports Rename google-changeId.py set option for dynamic fuse Backports included here for test fixes DAOS-15429 test: Fix Go unit tests (#13981) DAOS-13490 test: Update valgrind suppressions. (#13142) DAOS-15159 test: add a supression for new valgrind warning in NLT (#13782) DAOS-14669 test: switch tcp;ofi_rxm testing to tcp (#13365) DAOS-15548 test: add new valgrind suppression for daos tool (#14081) Signed-off-by: Jeff Olivier <jeffolivier@google.com> Signed-off-by: Michael MacDonald <mjmac@google.com> Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com> Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com> Signed-off-by: Jerome Soumagne <jerome.soumagne@intel.com>

Example usage: // Original state [juszhan_google_com@juszhan-dev daos]$ ls -la /tmp | grep dfuse0 drwxr-xr-x 1 juszhan_google_com juszhan_google_com 120 Apr 17 23:53 dfuse0 -rw-r--r-- 1 juszhan_google_com juszhan_google_com 9146 Apr 17 23:53 dfuse0.log // Change group to a known group id [juszhan_google_com@juszhan-dev daos]$ getent group 1001 tmpuserjohn:x:1001: [juszhan_google_com@juszhan-dev daos]$ run_cmd daos fs chown pool cont -g 1001 --dfs-path=/ Running DAOS_AGENT_DRPC_DIR=/tmp/agent daos fs chown pool cont -g 1001 --dfs-path=/ [juszhan_google_com@juszhan-dev daos]$ ls -la /tmp | grep dfuse0 drwxr-xr-x 1 juszhan_google_com tmpuserjohn 120 Apr 17 23:53 dfuse0 -rw-r--r-- 1 juszhan_google_com juszhan_google_com 9146 Apr 17 23:53 dfuse0.log // Change group to a nonexistent group id [juszhan_google_com@juszhan-dev daos]$ getent group 1002 [juszhan_google_com@juszhan-dev daos]$ run_cmd daos fs chown pool cont -g 1002 --dfs-path=/ Running DAOS_AGENT_DRPC_DIR=/tmp/agent daos fs chown pool cont -g 1002 --dfs-path=/ [juszhan_google_com@juszhan-dev daos]$ ls -la /tmp | grep dfuse0 drwxr-xr-x 1 juszhan_google_com 1002 120 Apr 17 23:53 dfuse0 -rw-r--r-- 1 juszhan_google_com juszhan_google_com 9146 Apr 17 23:53 dfuse0.log Required-githooks: true Signed-off-by: Justin Zhang <juszhan@google.com>

As requested by the Jira ticket, add a new I/O forwarding mechanism, dss_chore, to avoid creating a ULT for every forwarding task. - Forwarding of object I/O and DTX RPCs is converted to chores. - Cancelation is not implemented, because the I/O forwarding tasks themselves do not support cancelation yet. - In certain engine configurations, some xstreams do not need to initialize dx_chore_queue. This is left to future work. Signed-off-by: Li Wei <wei.g.li@intel.com>

When dss_chore.cho_func returns DSS_CHORE_DONE, the dss_chore object may have been freed already. For instance, in the dtx_rpc_helper case, dtx_check may have already returned, freeing (strictly speaking, releasing) its stack frame that contains the dca.dca_chore object. Hence, after calling chore->cho_func, dss_chore_queue_ult should only dereference chore if the return value is DSS_CHORE_YIELD. Signed-off-by: Li Wei <wei.g.li@intel.com>

Updates control plane tools to set a context in a logger for ease of debug/trace logging. Signed-off-by: Michael MacDonald <mjmac@google.com>

This commit comprises two separate patches to enable optional collection and export of client-side telemetry. The daos_agent configuration file includes new parameters to control collection and export of per-client telemetry. If the telemetry_port option is set, then per-client telemetry will be published in Prometheus format for real-time sampling of client processes. By default, the client telemetry will be automatically cleaned up on client exit, but may be optionally retained for some amount of time after client exit in order to allow for a final sample to be read. Example daos_agent.yml updates: telemetry_port: 9192 # export on port 9192 telemetry_enable: true # enable client telemetry for all connected clients telemetry_retain: 1m # retain metrics for 1 minute after client exit If telemetry_enable is false (default), client telemetry may be enabled on a per-process basis by setting D_CLIENT_METRICS_ENABLE=1 in the environment for clients that should collect telemetry. Notes from the first patch by Di: Move TLS to common, so both client and server can have TLS, which metrics can be attached metrics on it. Add object metrics on the client side, enabled by export D_CLIENT_METRICS_ENABLE=1. And client metrics are organized as "/jobid/pid/xxxxx". During each daos thread initialization, it will created another shmem (pid/xxx), which all metrics of the thread will be attached to. And this metric will be destroyed once the thread exit, though if D_CLIENT_METRICS_RETAIN is set, these client metrics will be retain, and it can be retrieved by daos_metrics --jobid Add D_CLIENT_METRICS_DUMP_PATH dump metrics from current thread once it exit. Some fixes in telemetrics about conv_ptr during re-open the share memory. Add daos_metrics --jobid XXX options to retrieve all metrics of the job. Includes some useful ftest updates from the following commit: * DAOS-11626 test: Adding MD on SSD metrics tests (#13661) Adding tests for WAL commit, reply, and checkpoint metrics. Signed-off-by: Phil Henderson <phillip.henderson@intel.com> Signed-off-by: Michael MacDonald <mjmac@google.com> Signed-off-by: Di Wang <di.wang@intel.com> Co-authored-by: Phil Henderson <phillip.henderson@intel.com> Co-authored-by: Di Wang <di.wang@intel.com>

Multiple cherry-picks to add daos fs query feature for fuse statistics. Helpful for understanding and tuning DAOS performance when dfuse is used. DAOS-13625 dfuse: Merge the info and projection_info structs. (#11881) DAOS-13658 dfuse: Add filesystem query command. (#12367) DAOS-12751 control: Add a daos filesystem evict command. (#12331) DAOS-12751 dfuse: Improve evict command. (#12633) DAOS-13625 dfuse: Remove dfuse_projection_info entirely. (#12796) DAOS-13625 dfuse: Replace fs_handle with dfuse_info. (#12894) DAOS-13625 dfuse: Add core inode_lookup() and inode_decref() functions. (#12573) DAOS-14411 dfuse: Add per-container statistics. (#12819) DAOS-14411 control: Expose dfuse statistics as yaml. (#13876) Changed base branch to google/2.4 for daos_build test Change-Id: I8ae3cc743697c2434ae0d54b382ee6c585a3b033 Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com> Signed-off-by: Jeff Olivier <jeffolivier@google.com>

…4274) Change-Id: Ia8452f68990f495e42e8af2e8a1eb7c951fbbdfa Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

…stem (#14318) improve the daos_init() and pool_connect() process to reuse the attach info instead of doing agent drpc upcalls multiple times. Also includes: DAOS-15655 control: fix support for non default system name (#14170) Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com>

Pipeline lib isn't reading any default that has - rather than underscore. After talking to Intel, changing to _ is best path forward. Signed-off-by: Jeff Olivier <jeffolivier@google.com>

Backport for the following patches DAOS-13380 engine: refine tgt_nr check (#12405) DAOS-15739 engine: Add multi-socket support (#14234) DAOS-623 engine: Fix a typo (#14329) * DAOS-13380 engine: refine tgt_nr check 1. for non-DAOS_TARGET_OVERSUBSCRIBE case fail to start engine if #cores is not enough 2. for DAOS_TARGET_OVERSUBSCRIBE case allow to force start engine The #nr_xs_helpers possibly be reduced for either case. * DAOS-15739 engine: Add multi-socket support (#14234) Add a simple multi-socket mode for use cases where a single engine must be used. Avoids the issue of having all helper xstreams automatically assigned to a single NUMA node thus increasing efficiency of synchronizations between I/O and helper xstreams. It is the default behavior if all of the following are true Neither pinned_numa_node nor first_core are used. No oversubscription is requested NUMA has uniform number of cores targets and helpers divide evenly among numa nodes There is more than one numa node Update server config logic to ensure first_core is passed on to engine if it's set while keeping existing behavior when both first_core: 0 and pinned_numa_node are set. Signed-off-by: Jeff Olivier <jeffolivier@google.com> Signed-off-by: Xuezhao Liu <xuezhao.liu@intel.com> Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>

github-actions · 2024-05-13T20:34:44Z

Bug-tracker data:
Errors are component not formatted correctly,Ticket number prefix incorrect,PR title is malformatted. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data
https://daosio.atlassian.net/browse/Test

daosbuild1

Style warning(s) for job https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-14355/1/
Please review https://wiki.hpdd.intel.com/display/DC/Coding+Rules

daosbuild1 · 2024-05-13T20:41:37Z

utils/githooks/hook_base.sh

@@ -16,6 +16,8 @@ run-parts() {
    for i in $(LC_ALL=C; echo "${dir%/}"/*[^~,]); do
        # don't run vim .swp files
        [ "${i%.sw?}" != "${i}" ] && continue
+        # for new repo, skip old changeId script
+        [ $(basename "${i}") == "20-user-changeId" ] && continue


(lint) Quote this to prevent word splitting. [SC2046]

daosbuild1 · 2024-05-13T20:41:37Z

src/cart/crt_context.c

@@ -1167,6 +1212,7 @@ crt_context_req_track(struct crt_rpc_priv *rpc_priv)
 	d_list_t		*rlink;
 	d_rank_t		 ep_rank;
 	int			 rc = 0;
+	int 			quota_rc = 0;


Suggested change

int quota_rc = 0;

int quota_rc = 0;

daosbuild1 · 2024-05-13T20:41:37Z

src/client/dfuse/dfuse_main.c

+	int len = 0;
+
+	int res = sscanf(mountpoint, "/dev/fd/%u%n", &fd, &len);
+	if (res != 1) {


Suggested change

if (res != 1) {

int res = sscanf(mountpoint, "/dev/fd/%u%n", &fd, &len);

daosbuild1 · 2024-05-13T20:41:37Z

src/client/dfuse/dfuse_main.c

+	}
+
+	int fd_flags = fcntl(fd, F_GETFD);
+	if (fd_flags == -1) {


Suggested change

if (fd_flags == -1) {

int fd_flags = fcntl(fd, F_GETFD);

daosbuild1 · 2024-05-13T20:41:38Z

src/client/dfuse/dfuse_main.c

+		 * fail for these paths.
+		 */
+		int fd = check_fd_mountpoint(dfuse_info->di_mountpoint);
+		if (fd == -1) {


Suggested change

if (fd == -1) {

int fd = check_fd_mountpoint(dfuse_info->di_mountpoint);

daosbuild1 · 2024-05-13T20:41:40Z

src/rebuild/srv.c

-		rc = regenerate_task_of_type(pool, PO_COMP_ST_DOWN, RB_OP_EXCLUDE);
+	if (entry->dpe_val & (DAOS_SELF_HEAL_AUTO_REBUILD | DAOS_SELF_HEAL_DELAY_REBUILD)) {
+		rc = regenerate_task_of_type(pool, PO_COMP_ST_DOWN,
+					    entry->dpe_val & DAOS_SELF_HEAL_DELAY_REBUILD ? -1 : 0);


Suggested change

entry->dpe_val & DAOS_SELF_HEAL_DELAY_REBUILD ? -1 : 0);

entry->dpe_val & DAOS_SELF_HEAL_DELAY_REBUILD ? -1 : 0);

daosbuild1 · 2024-05-13T20:41:40Z

src/tests/suite/daos_rebuild_ec.c

+	print_message("sleep 30 seconds for rebuild to be scheduled/delay \n");
+	sleep(30);
+	extend_single_pool_rank(arg, 6);
+	print_message("sleep 5 seconds for extend be scheduled/combined \n");


Suggested change

print_message("sleep 30 seconds for rebuild to be scheduled/delay \n");

sleep(30);

extend_single_pool_rank(arg, 6);

print_message("sleep 5 seconds for extend be scheduled/combined \n");

print_message("sleep 30 seconds for rebuild to be scheduled/delay\n");

sleep(30);

extend_single_pool_rank(arg, 6);

print_message("sleep 5 seconds for extend be scheduled/combined\n");

daosbuild1 · 2024-05-13T20:41:40Z

src/tests/suite/daos_test_common.c

+ * If this is client rank 0, set fail_loc to \a fail_loc on \a engine_rank. The
+ * caller must eventually set fail_loc to 0 on these engines, even when using
+ * DAOS_FAIL_ONCE.
+ * 


Suggested change

*

*

daosbuild1 · 2024-05-13T20:41:40Z

src/tests/suite/daos_test_common.c

+
+/**
+ * If this is client rank 0, set fail_value to \a fail_value on \a engine_rank.
+ * 


Suggested change

*

*

daosbuild1 · 2024-05-13T20:41:40Z

src/tests/suite/daos_test_common.c

+
+/**
+ * If this is client rank 0, set fail_num to \a fail_num on \a engine_rank.


Suggested change

/**

* If this is client rank 0, set fail_num to \a fail_num on \a engine_rank.

/**

* If this is client rank 0, set fail_num to \a fail_num on \a engine_rank.

*

daosbuild1 · 2024-05-13T20:41:45Z

Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14355/1/execution/node/204/log

chowes and others added 30 commits April 10, 2024 13:30

Use getpass instead of os.getlogin for log files

4e42e39

Change-Id: Ic0eeee9df2f0ef29f3f3f047080fdce109af71bf

Add chmod to daos utility

f5435a1

TESTED=https://paste.googleplex.com/6208972604833792 BUG=311738671 Change-Id: Ia6658d7c99c8d21c35d724b86fa2c1c48b41069f

b/316146524 cart: Add RPC metrics

2da9ecb

Adds some cart-level metrics for RPC quota exceeded and RPC queue depth. Required-githooks: true Change-Id: I5760c255e13ca9a70d352017cae2f6bcee5a6959 Signed-off-by: Michael MacDonald <mjmac@google.com>

b/317352547 pool: Increase DAOS_MD_CAP default to 1GiB

0a6fd8e

Matches new default in 2.6+; aligns default value with standard tuning practices. Required-githooks: true Change-Id: I817927a160fc3dbb2c60a12107da668147e78706 Signed-off-by: Michael MacDonald <mjmac@google.com>

DAOS-623 build: daos_metrics is in server rpm

8b7eb8a

It should be part of server build, not tests Required-githooks: true Change-Id: I28b537e1ea7c32a323036c3ec935517ec97ad80c Signed-off-by: Jeff Olivier <jeffolivier@google.com>

DAOS-14532 gurt: Replace environment APIs hook

61d483b

Replace getenv() function with d_agetenv_str() and d_freeenv_str() Required-githooks: true Change-Id: I6a3e3fafc82327c091bfe96bea3e5f0ef5bece48 Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@intel.com>

Add a configuration check for libarchive

b6bd623

Required-githooks: true Change-Id: I886d130eb20194a1870579bd47ade2b6e4b3b35a Signed-off-by: Jeff Olivier <jeffolivier@google.com>

adjustments so ftests can run on gcp

5c7e312

Change-Id: I6bf8765142024e3fd404d51f186c830e8af4bca5

nlt: use getpass instead of os.getlogin

3108ce9

getlogin does not work on the GKE pods that host our presubmits. BUG=318885377 Change-Id: If4175d8a19b0174d489754659f34d4237cab6e97

Change default to STATIC_FUSE=1

0a94098

This ensures the dfuse we ship uses the version of libfuse we want. Required-githooks: true Change-Id: I5aca28fdcb0e678fbd19df94cbf7428f5b9d61d2 Signed-off-by: Jeff Olivier <jeffolivier@google.com>

DAOS-623 doc: Fix some doxygen formatting issues. (#12980)

1b4c72c

Correct some doxygen style formatting that was not valid doxygen. Change-Id: If332fc006b7ed615903a19f1ee59337322a406c0 Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

wangdi and others added 20 commits April 10, 2024 13:31

DAOS-14010 placement: refine for delay rebuild (#13879)

8db4976

Fix a few typo for delay rebuild. Change-Id: I9db5c2de7e2773da9dd0cc631f13ebec12fbb6c0 Signed-off-by: Di Wang <di.wang@intel.com>

b/328273417 build: Update grpc and x/net deps

43f0aa9

Address CVEs found in these dependencies by updating to the latest released versions. Change-Id: I032403700d6ebb43ba6be519bf0d82cc5eb1ebfb

b/328753201 test: Switch NLT to TCP provider

f6b0ae6

Test the provider we are deploying with. Also fixes NLT after landing the cont chown patch. Required-githooks: true Change-Id: I3371be152d509cf1bb5f94cf85cc27b95fb108be Signed-off-by: Michael MacDonald <mjmac@google.com>

DAOS-15412 ioil: add support for SEEK_END

1936732

Seeking to SEEK_END is not impolemented in libioil. It causes interception to be disabled with some python framework. Change-Id: I362d5d1d61449e7b03b2af21460512143547f99d Signed-off-by: Johann Lombardi <johann.lombardi@gmail.com>

Add wrapper to changeId hook so we abort on empty message

004ae9c

Change-Id: Ia4d10688686d992a706da725f7d15db45a418531 Signed-off-by: Jeff Olivier <jeffolivier@google.com>

DAOS-14850 control: Allow logging.Logger in Context (#13569) (#14210)

2d68c65

Updates control plane tools to set a context in a logger for ease of debug/trace logging. Signed-off-by: Michael MacDonald <mjmac@google.com>

DAOS-15753 dfuse: Do not deadlock when failing to mount. (#14252) (#1…

ca43ed2

…4274) Change-Id: Ia8452f68990f495e42e8af2e8a1eb7c951fbbdfa Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

DAOS-623 ci: Fix bug to disable hw tests (#14328)

d438ace

Pipeline lib isn't reading any default that has - rather than underscore. After talking to Intel, changing to _ is best path forward. Signed-off-by: Jeff Olivier <jeffolivier@google.com>

daosbuild1 requested changes May 13, 2024

View reviewed changes

mlawsonca closed this May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test merge from release/2.4 [DO NOT MERGE] #14355

Test merge from release/2.4 [DO NOT MERGE] #14355

mlawsonca commented May 13, 2024

github-actions bot commented May 13, 2024

daosbuild1 left a comment

daosbuild1 May 13, 2024

daosbuild1 May 13, 2024

daosbuild1 May 13, 2024

daosbuild1 May 13, 2024

daosbuild1 May 13, 2024

daosbuild1 May 13, 2024

daosbuild1 May 13, 2024

daosbuild1 May 13, 2024

daosbuild1 May 13, 2024

daosbuild1 May 13, 2024

daosbuild1 commented May 13, 2024

	if (res != 1) {
	int res = sscanf(mountpoint, "/dev/fd/%u%n", &fd, &len);

	if (fd == -1) {
	int fd = check_fd_mountpoint(dfuse_info->di_mountpoint);

	entry->dpe_val & DAOS_SELF_HEAL_DELAY_REBUILD ? -1 : 0);
	entry->dpe_val & DAOS_SELF_HEAL_DELAY_REBUILD ? -1 : 0);


		/**
		* If this is client rank 0, set fail_num to \a fail_num on \a engine_rank.

Test merge from release/2.4 [DO NOT MERGE] #14355

Test merge from release/2.4 [DO NOT MERGE] #14355

Conversation

mlawsonca commented May 13, 2024

github-actions bot commented May 13, 2024

daosbuild1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daosbuild1 commented May 13, 2024