Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-14408 common: enable NDCTL for DCPM #14371

Merged
merged 74 commits into from
Oct 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
ee6d615
DAOS-14408 common: ensure NDCTL not used for storage class `ram`
grom72 Aug 8, 2024
f74f025
Ensure ABT_THREAD_STACKSIZE>=18432 for dcpm storage class
grom72 Aug 27, 2024
ceccaca
Merge remote-tracking branch 'origin/master' into grom72/ndctl-valida…
grom72 Aug 28, 2024
12bed56
Test with actual PMDK pkg.
grom72 Aug 28, 2024
44c0346
Repeate DaosBuild test on Medium HW
grom72 Aug 29, 2024
9598f9a
tgt_vos_create_one() ULT requires bigger stack size in some configura…
grom72 Aug 29, 2024
fc2c933
Final verification on master
grom72 Aug 29, 2024
1f3f723
ABT_THREAD_STACKSIZE is automatically set during engin startup
grom72 Aug 30, 2024
a9a4a5c
Remove obsolete test config
grom72 Aug 30, 2024
8f6e286
Revert "Remove obsolete test config"
grom72 Aug 30, 2024
884f96d
Automatically set the PMEMOBJ_CONF env var
grom72 Aug 30, 2024
5ec658a
Exclude PMEMOBJ_CONF env var as it is automatically set
grom72 Aug 30, 2024
2ca8e35
Fix tests
grom72 Aug 30, 2024
4528923
Fix tests
grom72 Aug 30, 2024
648d374
Fix more tests
grom72 Aug 30, 2024
0976c04
Restore NLT original configuration.
grom72 Aug 30, 2024
dfb9bf0
Restore NLT original configuration.
grom72 Aug 30, 2024
c1dba55
Full final validation
grom72 Aug 30, 2024
d6fd878
Validation with legacy PMDK pkg (no NDCTL enabled)
grom72 Aug 31, 2024
a9a12d5
Unit tests for PMEMOBJ environment variables configuration validation
grom72 Sep 2, 2024
118ad26
Fix unit tests
grom72 Sep 2, 2024
a33e5c0
Fix unit tests (2)
grom72 Sep 2, 2024
f8c108b
Fix unit tests (3)
grom72 Sep 2, 2024
bcdced7
Final unit tests tuning (and final validation)
grom72 Sep 2, 2024
193f4e6
Final validation (2nd)
grom72 Sep 3, 2024
560d9de
Changelog update
grom72 Sep 3, 2024
3212708
Fix changelog
grom72 Sep 3, 2024
f2a865e
Final validation (3rd)
grom72 Sep 3, 2024
5d4aaf2
Code update based on reviewers feedback
grom72 Sep 3, 2024
9815445
Fix source code format
grom72 Sep 3, 2024
8bd6325
Update code style and extend tests scenarious
grom72 Sep 4, 2024
cc7000b
Remove underscores from go variables names
grom72 Sep 4, 2024
e80ae21
Allign ULT default stack size with Linux page size
grom72 Sep 4, 2024
1ace438
Force build
grom72 Sep 5, 2024
05efd1e
Fix documentation 18KiB -> 20KiB
grom72 Sep 5, 2024
6e07425
Upgrade PMDK to version 2.1.0 to enable NDCTL for engines with DCPM
grom72 Sep 5, 2024
dbb626b
Fix: add dependencies required for PMDK w/ NDCTL
grom72 Sep 5, 2024
6dd7d34
Fix: add dependencies required for PMDK w/ NDCTL (2nd)
grom72 Sep 5, 2024
16c0519
Revert "Fix: add dependencies required for PMDK w/ NDCTL (2nd)"
grom72 Sep 5, 2024
ec653de
Revert "Fix: add dependencies required for PMDK w/ NDCTL"
grom72 Sep 5, 2024
b0842ed
Revert "Upgrade PMDK to version 2.1.0 to enable NDCTL for engines wit…
grom72 Sep 5, 2024
cf26dd1
Upgrade PMDK to version 2.1.0 to enable NDCTL for engines
grom72 Sep 5, 2024
8297d37
Fix typo
grom72 Sep 5, 2024
da701b4
Fix typo in pkg name
grom72 Sep 6, 2024
23f52e6
Fix GHA ARM build
grom72 Sep 6, 2024
83123e2
Force landing builds workflow when build.config is modified
grom72 Sep 6, 2024
88d5087
Revert "Fix typo in pkg name"
grom72 Sep 6, 2024
913b030
Reapply "Fix typo in pkg name"
grom72 Sep 6, 2024
ac3193a
Fix PMDK patch
grom72 Sep 6, 2024
4c5ed9d
Merge remote-tracking branch 'origin/master' into grom72/ndctl-valida…
grom72 Sep 6, 2024
eea85fb
Merge remote-tracking branch 'origin/master' into grom72/ndctl-valida…
grom72 Sep 9, 2024
ee99bf5
Changelog update
grom72 Sep 9, 2024
bd9100f
Adjust PMDK env var before engine run.
grom72 Sep 9, 2024
48f6285
UpdatePMDKEnvars() is moved to processConfig()
grom72 Sep 11, 2024
4f90e7a
Restore license date
grom72 Sep 11, 2024
f4c187d
Minor fixes to improve code clarity
grom72 Sep 11, 2024
8f8d846
Fix dfuse/build_test.py
grom72 Sep 12, 2024
c9f0c11
Fix dfuse/build_test.py (2nd)
grom72 Sep 12, 2024
e701bb8
Fix dfuse/build_test.py test (3rd)
grom72 Sep 13, 2024
e52972b
Fix: a simpler implementation of the unit tests setter.
grom72 Sep 13, 2024
e433d4c
Force full validation with the legacy PMDK
grom72 Sep 15, 2024
9175174
RPMs validation with PMDK 2.1.0 w NDCTL enabled
grom72 Sep 17, 2024
82e9e92
RPMs validation with PMDK 2.1.0 w NDCTL enabled (2nd)
grom72 Sep 24, 2024
bc08b29
RPMs validation with PMDK 2.1.0 w NDCTL enabled (2nd)
grom72 Sep 24, 2024
0565522
RPMs validation with PMDK 2.1.0 w NDCTL enabled (3nd)
grom72 Sep 25, 2024
c6f4853
Add libndctl-devel for leap test environment
grom72 Sep 25, 2024
817c616
Add libndctl-devel for to Ubuntu pkg spec
grom72 Sep 25, 2024
e444b1f
Final validation.
grom72 Sep 26, 2024
8d7da45
Final validation (no NDCTL).
grom72 Sep 26, 2024
d13a99b
Merge remote-tracking branch 'origin/master' into grom72/ndctl-valida…
grom72 Oct 1, 2024
7c9d8ee
Merge remote-tracking branch 'origin/master' into grom72/ndctl-valida…
grom72 Oct 4, 2024
5b0183e
Force build and tests on various OSes
grom72 Oct 8, 2024
5061c98
Force build and tests on various OSes with legacy PMDK
grom72 Oct 8, 2024
4323936
Force build and tests on various OSes with legacy PMDK
grom72 Oct 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/landing-builds.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ on:
- ci/**
- requirements-build.txt
- requirements-utest.txt
- utils/build.config

permissions: {}

Expand Down
16 changes: 16 additions & 0 deletions debian/changelog
Original file line number Diff line number Diff line change
@@ -1,3 +1,19 @@
daos (2.7.100-7) unstable; urgency=medium
[ Tomasz Gromadzki ]
* Add support of the PMDK package 2.1.0 with NDCTL enabled.
* Increase the default ULT stack size to 20KiB if the engine uses
the DCPM storage class.
* Prevent using the RAM storage class (simulated PMem) when
the shutdown state (SDS) is active.
* Automatically disable SDS for the RAM storage class on engine startup.
* Force explicitly setting the PMEMOBJ_CONF='sds.at_create=0'
environment variable to deactivate SDS for the DAOS tools
(ddb, daos_perf, vos_perf, etc.) when used WITHOUT DCPM.
Otherwise, a user is supposed to be stopped by an error
like: "Unsafe shutdown count is not supported for this source".

-- Tomasz Gromadzki <tomasz.gromadzki@intel.com> Tue, 02 Oct 2024 12:00:00 +0200

daos (2.7.100-6) unstable; urgency=medium
[ Kris Jacque ]
* Bump minimum golang-go version to 1.21
Expand Down
6 changes: 4 additions & 2 deletions debian/control
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Build-Depends: debhelper (>= 10),
python3-distro,
libabt-dev,
libucx-dev,
libpmemobj-dev (>= 2.0.0),
libpmemobj-dev (>= 2.1.0),
libfuse3-dev,
libprotobuf-c-dev,
libjson-c-dev,
Expand Down Expand Up @@ -118,7 +118,9 @@ Depends: python (>=3.8), python3, python-yaml, python3-yaml,
daos-client (= ${binary:Version}),
daos-admin (= ${binary:Version}),
golang-go (>= 2:1.21),
libcapstone-dev
libcapstone-dev,
libndctl-dev,
libdaxctl-dev
Description: The Distributed Asynchronous Object Storage (DAOS) is an open-source
software-defined object store designed from the ground up for
massively distributed Non Volatile Memory (NVM). DAOS takes advantage
Expand Down
1 change: 0 additions & 1 deletion site_scons/components/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,6 @@ def define_components(reqs):
retriever=GitRepoRetriever(),
commands=[['make',
'all',
'NDCTL_ENABLE=n',
'BUILD_EXAMPLES=n',
'BUILD_BENCHMARKS=n',
'DOC=n',
Expand Down
86 changes: 86 additions & 0 deletions src/control/server/engine/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ package engine
import (
"fmt"
"os"
"strconv"
"strings"

"github.com/pkg/errors"
Expand All @@ -28,6 +29,8 @@ const (
envLogMasks = "D_LOG_MASK"
envLogDbgStreams = "DD_MASK"
envLogSubsystems = "DD_SUBSYS"

minABTThreadStackSizeDCPM = 20480
)

// FabricConfig encapsulates networking fabric configuration.
Expand Down Expand Up @@ -342,7 +345,80 @@ func (c *Config) Validate() error {
if err := ValidateLogSubsystems(subsystems); err != nil {
return errors.Wrap(err, "validate engine log subsystems")
}
return nil
}

// Ensure at least 20KiB ABT stack size for an engine with DCPM storage class.
func (c *Config) UpdatePMDKEnvarsStackSizeDCPM() error {
stackSizeStr, err := c.GetEnvVar("ABT_THREAD_STACKSIZE")
if err != nil {
c.EnvVars = append(c.EnvVars, fmt.Sprintf("ABT_THREAD_STACKSIZE=%d",
minABTThreadStackSizeDCPM))
return nil
}
// Ensure at least 20KiB ABT stack size for an engine with DCPM storage class.
stackSizeValue, err := strconv.Atoi(stackSizeStr)
if err != nil {
return errors.Errorf("env_var ABT_THREAD_STACKSIZE has invalid value: %s",
stackSizeStr)
}
if stackSizeValue < minABTThreadStackSizeDCPM {
return errors.Errorf("env_var ABT_THREAD_STACKSIZE should be >= %d "+
"for DCPM storage class, found %d", minABTThreadStackSizeDCPM,
stackSizeValue)
}
return nil
}

// Ensure proper configuration of shutdown (SDS) state
func (c *Config) UpdatePMDKEnvarsPMemobjConf(isDCPM bool) error {
pmemobjConfStr, pmemobjConfErr := c.GetEnvVar("PMEMOBJ_CONF")
//also work for empty string
hasSdsAtCreate := strings.Contains(pmemobjConfStr, "sds.at_create")
if isDCPM {
if !hasSdsAtCreate {
return nil
}
// Confirm default handling of shutdown state (SDS) for DCPM storage class.
return errors.New("env_var PMEMOBJ_CONF should NOT contain 'sds.at_create=?' " +
"for DCPM storage class, found '" + pmemobjConfStr + "'")
}

// Disable shutdown state (SDS) (part of RAS) for RAM-based simulated SCM.
if pmemobjConfErr != nil {
c.EnvVars = append(c.EnvVars, "PMEMOBJ_CONF=sds.at_create=0")
return nil
}
if !hasSdsAtCreate {
envVars, _ := common.DeleteKeyValue(c.EnvVars, "PMEMOBJ_CONF")
c.EnvVars = append(envVars, "PMEMOBJ_CONF="+pmemobjConfStr+
";sds.at_create=0")
return nil
}
if strings.Contains(pmemobjConfStr, "sds.at_create=1") {
return errors.New("env_var PMEMOBJ_CONF should contain 'sds.at_create=0' " +
"for non-DCPM storage class, found '" + pmemobjConfStr + "'")
}
return nil
}

// Ensure proper environment variables for PMDK w/ NDCTL enabled based on
// the actual configuration of the storage class.
func (c *Config) UpdatePMDKEnvars() error {

if len(c.Storage.Tiers) == 0 {
return errors.New("Invalid config - no tier 0 defined")
}

isDCPM := c.Storage.Tiers[0].Class == storage.ClassDcpm

if err := c.UpdatePMDKEnvarsPMemobjConf(isDCPM); err != nil {
return err
}

if isDCPM {
return c.UpdatePMDKEnvarsStackSizeDCPM()
}
return nil
}

Expand Down Expand Up @@ -690,3 +766,13 @@ func (c *Config) WithStorageIndex(i uint32) *Config {
c.Storage.EngineIdx = uint(i)
return c
}

// WithEnvVarAbtThreadStackSize sets environment variable ABT_THREAD_STACKSIZE.
func (c *Config) WithEnvVarAbtThreadStackSize(stack_size uint16) *Config {
return c.WithEnvVars(fmt.Sprintf("ABT_THREAD_STACKSIZE=%d", stack_size))
}

// WithEnvVarPMemObjSdsAtCreate sets PMEMOBJ_CONF env. var. to sds.at_create=0/1 value
func (c *Config) WithEnvVarPMemObjSdsAtCreate(value uint8) *Config {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only used by tests, it could be a lot simpler. You can just set the value, rather than tampering with what's already there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return c.WithEnvVars(fmt.Sprintf("PMEMOBJ_CONF=sds.at_create=%d", value))
}
207 changes: 207 additions & 0 deletions src/control/server/engine/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1104,3 +1104,210 @@ func TestFabricConfig_Update(t *testing.T) {
})
}
}

func TestConfig_UpdatePMDKEnvarsStackSizeDCPM(t *testing.T) {
validConfig := func() *Config {
return MockConfig().WithStorage(
storage.NewTierConfig().
WithStorageClass("dcpm"))
}

for name, tc := range map[string]struct {
cfg *Config
expErr error
expABTthreadStackSize int
}{
"empty config should not fail": {
cfg: MockConfig(),
expABTthreadStackSize: minABTThreadStackSizeDCPM,
},
"valid config for DCPM should not fail": {
cfg: validConfig().WithEnvVarAbtThreadStackSize(minABTThreadStackSizeDCPM),
expABTthreadStackSize: minABTThreadStackSizeDCPM,
},
"config for DCPM without thread size should not fail": {
cfg: validConfig(),
expABTthreadStackSize: minABTThreadStackSizeDCPM,
},
"config for DCPM with stack size big enough should not fail": {
cfg: validConfig().
WithEnvVarAbtThreadStackSize(minABTThreadStackSizeDCPM + 1),
expABTthreadStackSize: minABTThreadStackSizeDCPM + 1,
},
"config for DCPM with stack size too small should fail": {
cfg: validConfig().
WithEnvVarAbtThreadStackSize(minABTThreadStackSizeDCPM - 1),
expErr: errors.New(fmt.Sprintf("env_var ABT_THREAD_STACKSIZE "+
"should be >= %d for DCPM storage class, found %d",
minABTThreadStackSizeDCPM, minABTThreadStackSizeDCPM-1)),
},
"config for DCPM with invalid ABT_THREAD_STACKSIZE value should fail": {
cfg: validConfig().WithEnvVars("ABT_THREAD_STACKSIZE=foo_bar"),
expErr: errors.New("env_var ABT_THREAD_STACKSIZE has invalid value: foo_bar"),
},
} {
t.Run(name, func(t *testing.T) {
err := tc.cfg.UpdatePMDKEnvarsStackSizeDCPM()
test.CmpErr(t, tc.expErr, err)
if err == nil {
stackSizeStr, err := tc.cfg.GetEnvVar("ABT_THREAD_STACKSIZE")
test.AssertTrue(t, err == nil, "Missing env var ABT_THREAD_STACKSIZE")
stackSizeVal, err := strconv.Atoi(stackSizeStr)
test.AssertTrue(t, err == nil, "Invalid env var ABT_THREAD_STACKSIZE")
test.AssertEqual(t, tc.expABTthreadStackSize, stackSizeVal,
"Invalid ABT_THREAD_STACKSIZE value")
}
})
}
}

func TestConfig_UpdatePMDKEnvarsPMemobjConfDCPM(t *testing.T) {
validConfig := func() *Config {
return MockConfig().WithStorage(
storage.NewTierConfig().WithStorageClass("dcpm"))
}

for name, tc := range map[string]struct {
cfg *Config
expErr error
}{
"empty config should not fail": {
cfg: MockConfig(),
},
"valid config for DCPM should not fail": {
cfg: validConfig(),
},
"config for DCPM with forced sds.at_create (1) should fail": {
cfg: validConfig().WithEnvVarPMemObjSdsAtCreate(1),
expErr: errors.New("env_var PMEMOBJ_CONF should NOT contain " +
"'sds.at_create=?' for DCPM storage class, found 'sds.at_create=1'"),
},
"config for DCPM with forced sds.at_create (0) should fail": {
cfg: validConfig().WithEnvVarPMemObjSdsAtCreate(0),
expErr: errors.New("env_var PMEMOBJ_CONF should NOT contain " +
"'sds.at_create=?' for DCPM storage class, found 'sds.at_create=0'"),
},
} {
t.Run(name, func(t *testing.T) {
test.CmpErr(t, tc.expErr, tc.cfg.UpdatePMDKEnvarsPMemobjConf(true))
})
}
}

func TestConfig_UpdatePMDKEnvarsPMemobjConfNRam(t *testing.T) {
validConfig := func() *Config {
return MockConfig().WithStorage(
storage.NewTierConfig().
WithStorageClass("dcpm"))
}

for name, tc := range map[string]struct {
cfg *Config
expErr error
expPMEMOBJ_CONF string
}{
"empty config should not fail": {
cfg: validConfig(),
expPMEMOBJ_CONF: "sds.at_create=0",
},
"config for ram without PMEMOBJ_CONF should not fail": {
cfg: MockConfig(),
expPMEMOBJ_CONF: "sds.at_create=0",
},
"valid config for should not fail": {
cfg: validConfig().WithEnvVarPMemObjSdsAtCreate(0),
expPMEMOBJ_CONF: "sds.at_create=0",
},
"config for ram w/ PMEMOBJ_CONF w/o sds.at_create should should be updated": {
cfg: validConfig().WithEnvVars("PMEMOBJ_CONF=foo_bar"),
expPMEMOBJ_CONF: "foo_bar;sds.at_create=0",
},
"config for ram with sds.at_create set to 1 should fail": {
cfg: validConfig().WithEnvVarPMemObjSdsAtCreate(1),
expErr: errors.New("env_var PMEMOBJ_CONF should contain " +
"'sds.at_create=0' for non-DCPM storage class" +
", found 'sds.at_create=1'"),
},
"config for ram w/ PMEMOBJ_CONF w/ sds.at_create=1 should fail": {
cfg: validConfig().
WithEnvVars("PMEMOBJ_CONF=sds.at_create=1;foo-bar"),
expErr: errors.New("env_var PMEMOBJ_CONF should contain " +
"'sds.at_create=0' for non-DCPM storage class" +
", found 'sds.at_create=1;foo-bar'"),
},
} {
t.Run(name, func(t *testing.T) {
test.CmpErr(t, tc.expErr, tc.cfg.UpdatePMDKEnvarsPMemobjConf(false))
if len(tc.expPMEMOBJ_CONF) > 0 {
sds_at_create, err := tc.cfg.GetEnvVar("PMEMOBJ_CONF")
test.AssertTrue(t, err == nil, "Missing env var PMEMOBJ_CONF")
test.AssertEqual(t, tc.expPMEMOBJ_CONF, sds_at_create,
"Invalid PMEMOBJ_CONF")
}

})
}
}

func TestConfig_UpdatePMDKEnvars(t *testing.T) {
validConfig := func(storageclas string) *Config {
return MockConfig().WithStorage(
storage.NewTierConfig().
WithStorageClass(storageclas))
}
for name, tc := range map[string]struct {
cfg *Config
expErr error
expPMEMOBJ_CONF string
expABTthreadStackSize int
}{
"empty config should fail": {
cfg: MockConfig(),
expErr: errors.New("Invalid config - no tier 0 defined"),
expABTthreadStackSize: -1,
},
"valid config for RAM should not fail": {
cfg: validConfig("ram").
WithEnvVarAbtThreadStackSize(minABTThreadStackSizeDCPM - 1),
expPMEMOBJ_CONF: "sds.at_create=0",
expABTthreadStackSize: minABTThreadStackSizeDCPM - 1,
},
"invalid config for RAM should fail": {
cfg: validConfig("ram").WithEnvVarPMemObjSdsAtCreate(1),
expErr: errors.New("env_var PMEMOBJ_CONF should contain " +
"'sds.at_create=0' for non-DCPM storage class, " +
"found 'sds.at_create=1'"),
expABTthreadStackSize: -1,
},
"valid config for DCPM should not fail": {
cfg: validConfig("dcpm"),
expABTthreadStackSize: minABTThreadStackSizeDCPM,
},
"invalid config for DCPM should not fail": {
cfg: validConfig("dcpm").
WithEnvVarAbtThreadStackSize(minABTThreadStackSizeDCPM - 1),
expErr: errors.New("env_var ABT_THREAD_STACKSIZE should be >= 20480 " +
"for DCPM storage class, found 20479"),
expABTthreadStackSize: minABTThreadStackSizeDCPM - 1,
},
} {
t.Run(name, func(t *testing.T) {
errTc := tc.cfg.UpdatePMDKEnvars()
test.CmpErr(t, tc.expErr, errTc)
if len(tc.expPMEMOBJ_CONF) > 0 {
sds_at_create, err := tc.cfg.GetEnvVar("PMEMOBJ_CONF")
test.AssertTrue(t, err == nil, "Missing env var PMEMOBJ_CONF")
test.AssertEqual(t, tc.expPMEMOBJ_CONF, sds_at_create,
"Invalid PMEMOBJ_CONF")
}
if tc.expABTthreadStackSize >= 0 {
stackSizeStr, err := tc.cfg.GetEnvVar("ABT_THREAD_STACKSIZE")
test.AssertTrue(t, err == nil, "Missing env var ABT_THREAD_STACKSIZE")
stackSizeVal, err := strconv.Atoi(stackSizeStr)
test.AssertTrue(t, err == nil, "Invalid env var ABT_THREAD_STACKSIZE")
test.AssertEqual(t, tc.expABTthreadStackSize, stackSizeVal,
"Invalid ABT_THREAD_STACKSIZE value")
}
})
}
}
6 changes: 6 additions & 0 deletions src/control/server/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,12 @@ func processConfig(log logging.Logger, cfg *config.Server, fis *hardware.FabricI
return err
}

for _, ec := range cfg.Engines {
if err := ec.UpdatePMDKEnvars(); err != nil {
return err
}
}

return nil
}

Expand Down
Loading
Loading