-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests_l2: Added qat workload #117
Conversation
150f817
to
1d041e1
Compare
@mregmi @uMartinXu please review |
What's the reason it needs to run as privileged? It voids the plugin usage completely |
Good point @mythi. Is there a way we can address this? Via another fine grain SELinux policy? |
test case is performing some vfio actions and fails with no device found. tried current container_device_t and combination of SCC's with volume mounts- don't seem to suffice either. right, @mregmi |
the workload needs access to vfio_device_t. So we have to add another permission to container_device_t. Until that is done either we have to create another custom policy for this workload or run selinux in permissive or use priviledge.
|
We should track this as an known issue in a separate issue and submit a PR to add another permission to |
@vbedida79 could you file a github issue to track this permission problem?
|
Is DPDK "crypto-perf" the same? It would indeed be important to know if it's just On 2., agree but not just qatlib. I think we need someone focused on this and peek into DSA/IAA flows as well so we get the right settings in place early on. |
tests/l2/qat/qatlib_build.yaml
Outdated
dockerfile: | | ||
|
||
FROM registry.access.redhat.com/ubi8/ubi AS builder | ||
ARG QATLIB_VERSION="23.02.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARG QATLIB_VERSION="23.02.0" | |
ARG QATLIB_VERSION="23.08.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not there yet but it would be good to move to it if it happens before your release date
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
23.08.0 has released, updated it
autoconf \ | ||
libtool \ | ||
openssl-devel \ | ||
http://mirror.centos.org/centos/8-stream/PowerTools/x86_64/os/Packages/nasm-2.15.03-3.el8.x86_64.rpm \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the purpose of this build, I'd think we should go without nasm
. See https://github.com/intel/qatlib/blob/39e19d4fc09345bb857ee8745619fa0f83f6f824/INSTALL#L224-L225
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, thanks. will update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the purpose of this build, I'd think we should go without
nasm
. See https://github.com/intel/qatlib/blob/39e19d4fc09345bb857ee8745619fa0f83f6f824/INSTALL#L224-L225
Looks like the option is for the situation nasm compiler is not present in the build environment. If we have the nasm compiler, we should install the nasm and build the lib with fast-crc-in-assembler, otherwise, there might be some performance penalty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this "test application", it's acceptable but pulling rpms from random urls is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can look for official links of nasm. nasm & qatlib is on codeready-builder-for-rhel-8-x86_64-rpms repo for ubi images. With a dockerfile this repo can be accessed and built on RHEL systems enabled with subscription manager. Via buildconfigs, subscription manager and its supported repos can be enabled only via RH account personal keys. Plan to enable for future releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this "test application", it's acceptable but pulling rpms from random urls is not.
@mythi I agree with your point. For the qatlib, the user should use RH distributed qatblib rpm package and shoul not build the lib by themselves. However, currently, it is not very easily for user to do that.The enhancement is filed for Milestone 1.1.0. to address the gap.
All in all, for the testing of the first QAT release(1.0.1), this test case is good enough.
@mythi Thank you very much for your comments and help.
f07d2ce
to
7211bff
Compare
@uMartinXu @mythi @mregmi Updated PR. Please review. |
LGTM! |
8cf127d
to
416af6c
Compare
thanks for reviewing @mythi |
tests/l2/qat/qatlib_rbac.yaml
Outdated
metadata: | ||
annotations: | ||
kubernetes.io/description: 'SCC to use IPC_LOCK capability for qatlib pod' | ||
name: container-scc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we are creating a user-defined SCC for qat workload using qatlib. We should be very careful to create, maintain, and use a user-defined SCC see https://docs.openshift.com/container-platform/4.13/authentication/managing-security-context-constraints.html#security-context-constraints-creating_configuring-internal-oauth.
If we can't use the predefined SCC by OCP https://docs.openshift.com/container-platform/4.13/authentication/managing-security-context-constraints.html#default-sccs_configuring-internal-oauth
and have to create our own SCC. I suggest starting from predefined "restricted" or "restricted-v2" SCC and only adding the "IPC_LOCK" capability. To make the user-defined SCC more easily maintained, I suggest having a single yaml file to maintain it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we are creating a user-defined SCC for qat workload using qatlib.
It's for all containers using QAT through VFIO (also DPDK's libs and DPDK test apps like crypto/compress-perf
belong to this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we are creating a user-defined SCC for qat workload using qatlib. We should be very careful to create, I suggest starting from predefined "restricted" or "restricted-v2" SCC and only adding the "IPC_LOCK" capability. To make the user-defined SCC more easily maintained, I suggest having a single yaml file to maintain it
The pod needs to run as root, via restricted it wont work. Default capability with IPC_LOCK defaults to anyuid . The SCC here does not give access to file system and only executes as root with IPC_LOCK. Also, if other workloads need any other permissions, we just need to update this yaml to add it and in the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#35 is about making it possible to run containers as non-root if it helps to avoid custom SCCs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, tried this. For qatlib, it has a prereq to run as root https://github.com/intel/qatlib/blob/7429ee2b7c837137ed11959a3c2cc3729dc15739/INSTALL#L60. Will be following for gpu and sgx in 1.1.0 release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For qatlib, it has a prereq to run as root
@vbedida79 it's not true. At least in our orchestration case we can run QAT as non-root just like with gpu/sgx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without root, sample code exits
qaeMemInit started
dma_map_slab:200 VFIO_IOMMU_MAP_DMA failed va=7ffab563c000 iova=200000 size=200000 -- errno=12
ADF_UIO_PROXY err: adf_init_ring: unable to get ringbuf(v:(nil),p:(nil)) for rings in bank(0)
ADF_UIO_PROXY err: icp_adf_transCreateHandle: adf_init_ring failed
https://github.com/intel/qatlib/blob/7429ee2b7c837137ed11959a3c2cc3729dc15739/INSTALL#L60 Any other setup/permission you suggest to deploy with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you wish to test it non-root, CRI-O must be configured with:
[crio.runtime]
device_ownership_from_security_context = true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, on the host node then. this might be it. what do you think @mregmi and @uMartinXu
449b8f6
to
018afe7
Compare
LGTM! |
tests/l2/qat/qatlib_build.yaml
Outdated
|
||
RUN dnf -y update && \ | ||
dnf install -y gcc \ | ||
systemd-devel \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need systemd-devel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added from the dependencies according to qatlib doc https://github.com/intel/qatlib/blob/7429ee2b7c837137ed11959a3c2cc3729dc15739/INSTALL#L168
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--enable-systemd=no and without systemd-devel is the right choice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vbedida79 can we remove the systemd-devel package and figure out whether we still can build it. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated it, thanks
tests/l2/qat/qatlib_build.yaml
Outdated
./autogen.sh && \ | ||
./configure \ | ||
--prefix=/usr \ | ||
--disable-fast-crc-in-assembler \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to disable-fast-crc-in-assembler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nasm is installed via rpmand not available via yum/dnf in ubi. used this option. as suggested here #117 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated back to nasm, as discussed. @mregmi @uMartinXu please review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the justification to add it back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mregmi @uMartinXu mentioned possibility of performance issues with nasm removed
@@ -0,0 +1,62 @@ | |||
# Copyright (c) 2023 Intel Corporation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we change the commit log to:
tests_l2: Add qatlib based Intel QAT sample workload
qatlib is built from source in https://github.com/intel/qatlib
The qatlib version is specified by QATLIB_VERSION: 23.08.0
cpa_sample_code shipped with qatlib is used to verify the QAT provisioning on OCP
Resource:
qat.intel.com/dc is used for QAT compression/decompression
qat.intel.com/cy is used for QAT asymmetric/asymmetric cryptography
serviceAccount: intel-qat uses the user defined SCC intel-qat-scc which enables the IPC_LOCK capability for Qat-lib based workload see #122
Signed-off-by: veenadhari.bedida@intel.com
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also add the detailed commit log in the git. The PR description can only be accessed from the online GitHub service. Correct me @mythi @vbedida79
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can update commit message. do we plan to do this for every PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please. :-)
[IPC_LOCK] | ||
resources: | ||
requests: | ||
qat.intel.com/dc: '1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we tried the case when we have more than one qat resource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. /dev/vfio will have 4 dev files instead of 2. since 2 of dc and 2 of cy resources are requested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean have we ever tried to use multiple resources/VFs to do the dc or/and cy simultaneously in a single pod/container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes difference is what /dev/vfio would have. changed this job example, 2 cy and 2 dc. anything else to be checked here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The provisioning of multiple resources works.
But we might need to figure out whether the multiple resources can actually be used by the test case to do dc or/and cy offloading to QAT device simultaneously in a single pod/container.
For this PR, I think it is good enough to merge.
fb21cbd
to
20ae597
Compare
qatlib is built from source in https://github.com/intel/qatlib The qatlib version is specified by QATLIB_VERSION: 23.08.0 cpa_sample_code shipped with qatlib is used to verify the QAT provisioning on OCP Resources: qat.intel.com/dc is used for QAT compression/decompression qat.intel.com/cy is used for QAT asymmetric/asymmetric cryptography serviceAccount: intel-qat uses the user defined SCC intel-qat-scc which enables the IPC_LOCK capability for Qat-lib based workload see intel#122 Signed-off-by: vbedida79 <veenadhari.bedida@intel.com>
20ae597
to
730ba89
Compare
tests_l2: Add qatlib based Intel QAT sample workload
Resources:
Signed-off-by: veenadhari.bedida@intel.com