Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime Assisted Mount and Management enhancements #526

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ddebroy
Copy link
Contributor

@ddebroy ddebroy commented Oct 12, 2022

CSI spec enhancements to support deferral of file system mount and management operations from CSI plugins to container runtime handlers (detailed in kubernetes/enhancements#2893)

  • Introduce capabilities for CSI plugins to report support to defer file system mount and management operations to a container runtime handler
  • Introduce fields for a CO to report file system operations should be deferred (once CO determines CSI plugin and container runtime handler of workload are compatible and should be able to coordinate).

This does not specify how CSI plugins defer file system operations to a container runtime handler. Please see https://github.com/kubernetes/enhancements/pull/2893/files#diff-d29b488a06a8e4aa819829e0634c0affc40f4c4a35fa22738a7d02039ca50128R1281 for an outline of a common protocol that will allow CSI plugins and container runtime handlers to coordinate mount and management operations for file systems.

The main use-case of this right now is microvm based runtimes like Kata which benefit (from a perf and security perspective) when controlling the mounting of file systems on volumes (backed by block devices) rather than having the mount managed in the host and projected to the sandbox environment (using 9p or virtio-fs)

Existing CSI plugins in Kubernetes environments are achieving the above by performing API server lookups of the pod spec and associated runtime class (e.g. https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/ddef5e99b24da6e07a21184803bd37c16ade5686/pkg/disk/nodeserver.go#L241). This enhancement will allow the relevant information to be passed to the plugin without requiring the API server lookup of pod details.

@ddebroy ddebroy force-pushed the ram1 branch 3 times, most recently from 9948b6e to 6f051a7 Compare October 12, 2022 20:14
@ddebroy ddebroy changed the title Runtime Assisted Mount and Manamgent enhancements Runtime Assisted Mount and Management enhancements Oct 31, 2022
Signed-off-by: Deep Debroy <ddebroy@gmail.com>
@saad-ali
Copy link
Member

@jdef @bswartz Can you help review this?

Copy link
Contributor

@bswartz bswartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm finding the wording of the spec change confusing based on my high-level understanding of how this is supposed to work. Please correct my understanding if this is wrong:

SPs may support the ability to short-circuit the node operations (just do the non-FS operations and then stop) by advertising some new capabilities and receiving instructions from the CO on a per-attachment basis. The CO handles the coordination between the container runtime and the storage plugin. At no time does the SP communicate with any CRI.

The wording in the PR suggests that the SP does communicate with the CRI and I think that's unworkable. Is this a misunderstanding on my part or unclear wording that needs to be cleaned up?

@ddebroy
Copy link
Contributor Author

ddebroy commented Jan 20, 2023

@bswartz the overall flow for this will involve:

  1. For NodePublish, SPs calling APIs on a common proxy (CRuSt proxy) to deposit metadata files about the file system operations for the volume in a known location on the host (that runtime handlers can process)
  2. For NodeGetVolumeStats and NodeExpandVolume, SPs calling APIs on the common proxy (CRuSt proxy) to retrieve the data from a runtime handler.

Does the common proxy model work? If so, should I clarify the details about that in the CSI spec? The specifics about the proxy and how CSI plugins and runtimes will interact with it is described in the KEP here. Happy to incorporate a summary of it here if it is acceptable.

@bswartz
Copy link
Contributor

bswartz commented Jan 20, 2023

@bswartz the overall flow for this will involve:

1. For NodePublish, SPs calling APIs on a common proxy ([CRuSt](https://github.com/kubernetes/enhancements/pull/2893/files#diff-d29b488a06a8e4aa819829e0634c0affc40f4c4a35fa22738a7d02039ca50128R1081) proxy) to deposit metadata files about the file system operations for the volume in a known location on the host (that runtime handlers can process)

2. For NodeGetVolumeStats and NodeExpandVolume, SPs calling APIs on the common proxy (CRuSt proxy) to retrieve the data from a runtime handler.

Does the common proxy model work? If so, should I clarify the details about that in the CSI spec? The specifics about the proxy and how CSI plugins and runtimes will interact with it is described in the KEP here. Happy to incorporate a summary of it here if it is acceptable.

Is it not possible to limit the communication that the SP has to do to just the CSI RPCs? I'm concerned that by mandating reliance on another standard, this feature ends up being limiting rather than empowering. For example, rather than the SP talking to CRuSt, couldn't the SP return the relevant info to the CO and then the CO could pass it to CRuSt? Such a model would enable other COs to do smarter things and possibly evolve beyond CRuSt. If we tie this feature to CRuSt and CRuSt stalls at some point in the future, then CSI is tied to a dead or dying thing. I'm thinking 5-10 years into the future here.

I realize that it might add to the complexity of the CSI RPCs to have special flavors of the relevant functions for the defered-FS calls, but extra RPCs are less of a liability than mandating usage of another standard. I've only started reading the KEP, but this is my main concern and I'm hoping there's a way to structure the layers so that CSI remains self-contained and the new interactions are pushed up into the CO (kubelet in this case).

@ddebroy
Copy link
Contributor Author

ddebroy commented Jan 20, 2023

rather than the SP talking to CRuSt, couldn't the SP return the relevant info to the CO and then the CO could pass it to CRuSt? Such a model would enable other COs to do smarter things and possibly evolve beyond CRuSt.

This is a great suggestion and makes complete sense! Thanks, @bswartz. Let me explore that and adapt this PR (and the KEP) to the suggested flow.

@ameade
Copy link

ameade commented Feb 17, 2023

I almost had the same question as bswartz but looks like yall are thinking about it. I was thinking instead of the SP "deferring" the FS operations by calling a proxy that does it, perhaps it could be more like "skip" FS operations, then have the CO do the FS operations, perhaps by way of CRuSt. That might work since I assume the FS operations would be the last step and any of these operations anyways (NodeExpandVolume, NodeStageVolume, etc).

@xing-yang
Copy link
Contributor

This file is missing: lib/go/csi/csi.pb.go
It should be generated automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants