-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ON-15256: Report topology info from device plugin #119
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to see this exposing topology via the official Device Plugin API and querying sysfs directly rather than lshw. Embedding sfc assumptions into Onload won't make life simpler when implementing non-sfc features, but changing the data source can be done later, the output is more important.
Consider using ghw go library to avoid the mass of filesystem handling boilerplate -- it supports PCIe vendors, network devices, and topology.
solarflareVendor = "0x1924" | ||
) | ||
|
||
func isSFCNic(devicePath string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth using isOnloadableNic or isSupportedNic?
Just in case we are asked to support another vendor sometime in the future.
4fb3b06
to
d26fb55
Compare
I had a look into it, and it would work if we just cared about netdevices, but I don't think it will work for our purposes due to how it deals with pci devices. Rather than reading the files as I have done in this change, it wants to use a pci database (either from the file system or through the network). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you!
I would also suggest filing a Jira ticket for the queryNics()
unit test coverage. For instance, we could create a mock sysfs tree in the repo and feed it to the tests by replacing sysClassNetPath
(or similarly, whatever suits better).
I do want to increase test coverage for parts of the code that interact with the filesystem (this, |
Awesome! ON-15228 seems like the perfect fit. I bet you're already typing a comment over there with suggestions to improve the test coverage if anyone else wants to take a look at it unless you're planning to implement it in this PR (I assumed not, given it's ready for review)?! |
Thanks for taking a look. The missing PCI database is a logical consequence of a minimal container base image and easily surmountable. Just adding |
I'm not saying that it is an insurmountable problem, but I'm not really convinced whether it is worth it. If |
deviceplugin.Dockerfile
Outdated
FROM registry.access.redhat.com/ubi8/ubi-minimal:8.9 | ||
RUN microdnf install -y lshw-B.02.19.2 && microdnf clean all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're not using a package manager we can go distroless. Does ubi-micro
suffice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it does! I did some testing initially using scratch
as the base image, but that would fail with the initContainer/lifecycle commands. I had forgotten about ubi-micro
🤦♂️
Rather than rely on an external tool (lshw) which requires extra dependencies in container images and doesn't provide an simple way of determining the vendor of a nic, it would be better to search through sysfs directly (since that is basically all lshw is doing). This approach also allows us to easily get other information from sysfs (such as numa nodes).
The device plugin will now try to read the numa information from sysfs while finding the nics present in the system. The device plugin will then add all of the present numa node information to the list of advertised devices. Because the device plugin is only advertising a pseudo-device it is unclear how to proceed when multiple nics are present on different nodes.
d26fb55
to
dee5d45
Compare
[1/2] ON-15180: Read sysfs to check for NICs
Rather than rely on an external tool (lshw) which requires extra
dependencies in container images and doesn't provide an simple way of
determining the vendor of a nic, it would be better to search through
sysfs directly (since that is basically all lshw is doing). This
approach also allows us to easily get other information from sysfs (such
as numa nodes).
[2/2] ON-15256: Report topology info from device plugin
The device plugin will now try to read the numa information from sysfs
while finding the nics present in the system. The device plugin will
then add all of the present numa node information to the list of
advertised devices. Because the device plugin is only advertising a
pseudo-device it is unclear how to proceed when multiple nics are
present on different nodes.
Testing done
Some manual testing in the cluster, no unit tests (yet) trying to think of a way of mocking sysfs files.