Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume QA part II: fixes and tests #3133

Merged
merged 1 commit into from
Jul 8, 2024

Conversation

apostasie
Copy link
Contributor

@apostasie apostasie commented Jun 20, 2024

This is part 2 of #3125 - same idea.

The original motivation was to significantly increase testing coverage & quality, convergence with docker behavior, and stability wrt concurrency for anything related to volume lifecycle.

In the process of doing so, several issues cropped-up, that this PR also fix.

Tests:

  • parallelized (which does also serve as a very good way to uncover raciness)
  • expanded
  • cleaned-up / normalized
  • added namespace testing
  • test more thoroughly, both in terms of cases and validating content of stdout / stderr
  • move tests from _linux to all platforms (not sure why it was linux only in the first place) - wish I had a windows env to test

Code changes:

  • docker align: "no labels" should be nil, not []string{}
  • docker align: --label "" should error
  • docker align: multi-inspect that is partly failing, and other partly failing conditions
  • remove useless / duplicated code (specifically around validation of identifiers that was performed multiple times)
  • make sure all methods of the volume store are locking (to avoid racy behavior)
  • replace broken/racy logic "get then create" by just create (since behavior has been recently changed and now always returns success)
  • make the underlying filesystem implementation private (eg: no longer expose Path) to prevent leaking implementation to the consumer (see for example cmd/nerdctl/namespace.go b/cmd/nerdctl/namespace.go that was reading the filesystem directly instead of calling List)
  • added a "persistent" Lock and Unlock methods on the store, for operations that need to call multiple methods successively or need guarantees that volumes cannot be modified at all until they are done
  • cleaned-up and expanded lockutil, so that we don't expose anymore Flock (unimplemented for non Unix), and instead expose new lock/unlock methods
  • reviewed all the codebase for any cases where that global lock call is necessary and added them
  • reviewed codebase to fix usedVolumes (called in volume remove and prune) can be raced by container creation, and other sources of racyness in volumes #3132 (and fix Volume code may be racy #3123 hopefully)

While this is a significant PR in term of apparent code change size, the added testing should give us a good confidence boost on what's happening wrt volumes.

@apostasie apostasie force-pushed the dev-qa-volume branch 2 times, most recently from bc9bcff to a197d56 Compare June 20, 2024 23:57
@apostasie
Copy link
Contributor Author

@AkihiroSuda what's the timeline for v2?

Do I still have time for more or are we frozen for v2 already?

I would have liked to cleanup / fix all bugs for login overall, and maybe some network work, but there is only so much I can do today - so, if you intend on freezing, lmk.

@apostasie apostasie marked this pull request as draft June 21, 2024 00:04
@AkihiroSuda
Copy link
Member

v2

Planned to be synced with containerd v2
https://github.com/orgs/containerd/projects/9

@apostasie
Copy link
Contributor Author

v2

Planned to be synced with containerd v2 https://github.com/orgs/containerd/projects/9

Cool.

So I guess we also need this in then: containerd/stargz-snapshotter#1545

And #2902 to pass.

Lmk if there is something I can do to help with ^.

@AkihiroSuda
Copy link
Member

nerdctl v2 will have containerd v2 binaries in the nerdctl-full bundle, but I guess it is not necessary to use Go packages of containerd v2

Lmk if there is something I can do to help with ^.

Thanks, I'd appreciate if you can help #2902 and its dependencies (stargz, soci, etc.) 🙇

@apostasie apostasie force-pushed the dev-qa-volume branch 6 times, most recently from a42487c to 7d995eb Compare July 4, 2024 03:15
@@ -45,6 +48,12 @@ func processVolumeCreateOptions(cmd *cobra.Command) (types.VolumeCreateOptions,
if err != nil {
return types.VolumeCreateOptions{}, err
}
for _, label := range labels {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align with docker behavior. Empty labels are not allowed.

if err != nil {
log.L.Warn(err)
} else {
volEnts, err := os.ReadDir(volStore)
volEnts, err := volStore.List(false)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace direct access to underlying filesystem implementation by a proper api call.

@@ -151,7 +164,7 @@ func Create(ctx context.Context, client *containerd.Client, args []string, netMa
}

var mountOpts []oci.SpecOpts
mountOpts, internalLabels.anonVolumes, internalLabels.mountPoints, err = generateMountOpts(ctx, client, ensuredImage, options)
mountOpts, internalLabels.anonVolumes, internalLabels.mountPoints, err = generateMountOpts(ctx, client, ensuredImage, volStore, options)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are now locking the volume store, we need to pass it along.

@@ -32,9 +31,6 @@ func Create(name string, options types.VolumeCreateOptions) (*native.Volume, err
name = stringid.GenerateRandomID()
options.Labels = append(options.Labels, labels.AnonymousVolumes+"=")
}
if err := identifiers.Validate(name); err != nil {
return nil, fmt.Errorf("malformed name %s: %w", name, err)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless, as this is performed by the volume store.

var vol, err = volStore.Get(name, options.Size)
if err != nil {
return err
warns = append(warns, err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align with docker behavior

containers, err := client.Containers(ctx)
if err != nil {
return err
}
usedVolumes, err := usedVolumes(ctx, containers)

usedVolumesList, err := usedVolumes(ctx, containers)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better variable name...

continue
}
if !options.All {
if volume.Labels == nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that labels can be nil, need to check for that.

for _, name := range volumes {
if _, ok := usedVolumes[name]; ok {
return fmt.Errorf("volume %q is in use", name)
if _, ok := usedVolumesList[name]; ok {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align with docker behavior

@@ -41,6 +41,9 @@ func (c *Composer) upVolume(ctx context.Context, shortName string) error {

// shortName is like "db_data", fullName is like "compose-wordpress_db_data"
fullName := vol.Name
// FIXME: this is racy. By the time we get below to creating the volume, there is no guarantee that things are still fine
// Furthermore, volStore.Get no longer errors if the volume already exists (docker behavior), so, the purpose of this
// call needs to be assessed (it might still error if the name is malformed, or if there is a filesystem error)
volExists, err := c.VolumeExists(fullName)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand the compose code. Why do we shell out?
Leaving these untouched for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shell out

IIRC it was just for ease of quick implementation

log.L.WithError(err).Errorf("failed to unlock %q", dir)
}
}()
return fn()
}

func Flock(f *os.File, flags int) error {
func flock(f *os.File, flags int) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be public. It is unix only.

vol, err := volStore.Get(res.Name, false)

// Create returns an existing volume or creates a new one if necessary.
vol, err := volStore.Create(res.Name, nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason for any of that anymore.

@@ -106,16 +106,26 @@ func (m *MockVolumeStoreMockRecorder) Remove(names any) *gomock.Call {
return m.mock.ctrl.RecordCallWithMethodType(m.mock, "Remove", reflect.TypeOf((*MockVolumeStore)(nil).Remove), names)
}

// Dir mocks the Dir method of VolumeStore
func (m *MockVolumeStore) Dir() string {
// Lock mocks the Lock method of VolumeStore
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mock stuff. Still have no idea what is this and what to do here.
Merely ensuring we fullfill the interface here, but if you have info on what I should do here, please speak.

@@ -31,21 +31,33 @@ import (
"github.com/containerd/nerdctl/v2/pkg/strutil"
)

// Path returns a string like `/var/lib/nerdctl/1935db59/volumes/default`.
func Path(dataStore, ns string) (string, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be public.
Consumer should only use the interface methods and not fiddle with the underlying (filesystem based) implementation.

}

func (vs *volumeStore) Dir() string {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment on publicizing these.

@@ -32,6 +32,11 @@ import (

func TestSystemPrune(t *testing.T) {
testutil.RequiresBuild(t)
// FIXME: for some reason, using a dedicated namespace does work well with rootless, but fails with rootful... :/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea what's going on here, and why this happens only with rootful.
Will investigate later when I get to system tests...

t.Parallel()
}

// FIXME: for an unknown reason, when testing ipv6, calling NewBaseWithNamespace per sub-test, in the tearDown/tearUp methods
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea.........
I guess it is another thing to add to the list for whomever is going to cleanup our testing helpers eventually.

@apostasie apostasie force-pushed the dev-qa-volume branch 2 times, most recently from a78e999 to 0aca67e Compare July 4, 2024 04:17
@apostasie apostasie changed the title Volume QA: inspect, prune, create (fixes and tests) Volume QA part II: fixes and tests Jul 4, 2024
@apostasie apostasie marked this pull request as ready for review July 4, 2024 04:46
@apostasie
Copy link
Contributor Author

@AkihiroSuda this is finally ready - sorry it took so long to get there.

Also it is bigger that I wanted it to be - sorry about that as well. I guess problems just kept on piling up as I parallelized more...

Pure code changes are relatively small and targeted.
The only really new / interesting bit is the introduction of the "durable" locking mechanism for VolumeStore that allow operations like container create or container remove to lock the store for the duration of their operation.

The rest is merely small bug fixes - and a boatload of new tests.

I have added a bunch of github comments alongside the code to facilitate review.

PTAL at your convenience.

Thanks!

@AkihiroSuda AkihiroSuda added this to the v2.0.0 milestone Jul 4, 2024
@AkihiroSuda AkihiroSuda requested review from a team July 4, 2024 09:47
@apostasie apostasie requested a review from djdongjin July 4, 2024 17:43
Signed-off-by: apostasie <spam_blackhole@farcloser.world>
@apostasie
Copy link
Contributor Author

Rebased.

@AkihiroSuda last tiny push does remove the spurious commented leftovers, extra line, and updated the comment on rootful/buidkit.

Just did another proofread of the diff - we should be good to go (let's make sure the CI is happy of course).

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@AkihiroSuda AkihiroSuda merged commit 456c666 into containerd:main Jul 8, 2024
26 checks passed
@apostasie
Copy link
Contributor Author

Thanks

Thanks a lot @AkihiroSuda
Appreciate your time and effort reviewing these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants