Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests_l2: Added qat workload #117

Merged
merged 1 commit into from
Sep 13, 2023
Merged

Conversation

vbedida79
Copy link
Contributor

@vbedida79 vbedida79 commented Aug 30, 2023

tests_l2: Add qatlib based Intel QAT sample workload

  • qatlib is built from source in https://github.com/intel/qatlib
  • The qatlib version is specified by QATLIB_VERSION: 23.08.0
  • cpa_sample_code shipped with qatlib is used to verify the QAT provisioning on OCP
    Resources:
    • qat.intel.com/dc is used for QAT compression/decompression
    • qat.intel.com/cy is used for QAT asymmetric/asymmetric cryptography
  • serviceAccount: intel-qat uses the user defined SCC intel-qat-scc which enables the IPC_LOCK capability for Qat-lib based workload see P1-Blocker: Can not run QAT workload in the non-priviledged container #122

Signed-off-by: veenadhari.bedida@intel.com

@vbedida79
Copy link
Contributor Author

@mregmi @uMartinXu please review

@mythi
Copy link

mythi commented Aug 30, 2023

What's the reason it needs to run as privileged? It voids the plugin usage completely

@hershpa hershpa added the qat QAT feature label Aug 30, 2023
@hershpa hershpa added this to the v1.0.1 milestone Aug 30, 2023
@hershpa
Copy link
Contributor

hershpa commented Aug 30, 2023

Good point @mythi. Is there a way we can address this? Via another fine grain SELinux policy?

@hershpa hershpa added the enhancement New feature or request label Aug 30, 2023
@vbedida79
Copy link
Contributor Author

vbedida79 commented Aug 30, 2023

What's the reason it needs to run as privileged? It voids the plugin usage completely

test case is performing some vfio actions and fails with no device found. tried current container_device_t and combination of SCC's with volume mounts- don't seem to suffice either. right, @mregmi

@mregmi
Copy link
Contributor

mregmi commented Aug 30, 2023

the workload needs access to vfio_device_t. So we have to add another permission to container_device_t. Until that is done either we have to create another custom policy for this workload or run selinux in permissive or use priviledge.

type=AVC msg=audit(1693324314.875:9260): avc: denied { read write } for pid=3334307 comm="cpa_sample_code" name="526" dev="devtmpfs" ino=46020946 scontext=system_u:system_r:container_t:s0:c19,c27 tcontext=system_u:object_r:vfio_device_t:s0 tclass=chr_file permissive=0

@hershpa
Copy link
Contributor

hershpa commented Aug 30, 2023

We should track this as an known issue in a separate issue and submit a PR to add another permission to container_device_t for the workload to access vfio_device_t.

@uMartinXu
Copy link
Contributor

uMartinXu commented Aug 31, 2023

cpa_sample_code

@vbedida79 could you file a github issue to track this permission problem?
I think this issue might be a blocker issue for the release. We can not simply take it as a known issue and do the release.

  1. Looks like cpa_sample_code app tries to access devtempfs and got deneied. So we need to check whether it is the application that intends to access the devtempfs, or it is the qatlib to access devtempfs. If it is the qatlib access to the devtempfs, we have to update the container_device_t.
  2. We have to do a full permission audit to check whether there still exists some other permission issues in qatlib, evaluate and add all the necessary permissions to container_device_t. Of course, it might also can be fixed in the qatlib. All in all, it should be transparent to our end-users who use qatlib and qat provisioning in the container on OCP, the privilege mode is absolutely not acceptable for the user workload running in a container.
  3. If we do need to update the vfio_device_t, in order not to block the release, I think we need to activate and use https://github.com/intel/user-container-selinux. when we are working with RH and upstream to merge the new container_device_t into OCP. I don't know what is the earliest merge window for OCP.
  4. @mregmi BTW, could you show us what the permissions configuration defined in vfio_device_t? Just from the name looks like we might need to add it for VFIO related provision.

@mythi
Copy link

mythi commented Sep 1, 2023

  1. Looks like cpa_sample_code app tries to access devtempfs and got deneied. So we need to check whether it is the application that intends to access the devtempfs, or it is the qatlib to access devtempfs. If it is the qatlib access to the devtempfs, we have to update the container_device_t.

Is DPDK "crypto-perf" the same? It would indeed be important to know if it's just cpa_sample_code doing something strange. I wish we had qatengine available for the builds. Testing openssl speed -t qatengine would also be a good data point.

On 2., agree but not just qatlib. I think we need someone focused on this and peek into DSA/IAA flows as well so we get the right settings in place early on.

dockerfile: |

FROM registry.access.redhat.com/ubi8/ubi AS builder
ARG QATLIB_VERSION="23.02.0"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ARG QATLIB_VERSION="23.02.0"
ARG QATLIB_VERSION="23.08.0"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not there yet but it would be good to move to it if it happens before your release date

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

23.08.0 has released, updated it

autoconf \
libtool \
openssl-devel \
http://mirror.centos.org/centos/8-stream/PowerTools/x86_64/os/Packages/nasm-2.15.03-3.el8.x86_64.rpm \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the purpose of this build, I'd think we should go without nasm. See https://github.com/intel/qatlib/blob/39e19d4fc09345bb857ee8745619fa0f83f6f824/INSTALL#L224-L225

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks. will update

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the purpose of this build, I'd think we should go without nasm. See https://github.com/intel/qatlib/blob/39e19d4fc09345bb857ee8745619fa0f83f6f824/INSTALL#L224-L225

Looks like the option is for the situation nasm compiler is not present in the build environment. If we have the nasm compiler, we should install the nasm and build the lib with fast-crc-in-assembler, otherwise, there might be some performance penalty.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this "test application", it's acceptable but pulling rpms from random urls is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can look for official links of nasm. nasm & qatlib is on codeready-builder-for-rhel-8-x86_64-rpms repo for ubi images. With a dockerfile this repo can be accessed and built on RHEL systems enabled with subscription manager. Via buildconfigs, subscription manager and its supported repos can be enabled only via RH account personal keys. Plan to enable for future releases.

Copy link
Contributor

@uMartinXu uMartinXu Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this "test application", it's acceptable but pulling rpms from random urls is not.

@mythi I agree with your point. For the qatlib, the user should use RH distributed qatblib rpm package and shoul not build the lib by themselves. However, currently, it is not very easily for user to do that.The enhancement is filed for Milestone 1.1.0. to address the gap.

All in all, for the testing of the first QAT release(1.0.1), this test case is good enough.

@mythi Thank you very much for your comments and help.

@vbedida79 vbedida79 force-pushed the patch-290823-1 branch 2 times, most recently from f07d2ce to 7211bff Compare September 7, 2023 15:54
@vbedida79
Copy link
Contributor Author

@uMartinXu @mythi @mregmi Updated PR. Please review.

@mythi
Copy link

mythi commented Sep 7, 2023

LGTM!

@vbedida79 vbedida79 force-pushed the patch-290823-1 branch 2 times, most recently from 8cf127d to 416af6c Compare September 7, 2023 16:22
@vbedida79
Copy link
Contributor Author

vbedida79 commented Sep 7, 2023

thanks for reviewing @mythi
@chaitanya1731 thanks for reviewing, made the changes to buildconfig. (remove ARG default, builder tag, 2 resources)

metadata:
annotations:
kubernetes.io/description: 'SCC to use IPC_LOCK capability for qatlib pod'
name: container-scc
Copy link
Contributor

@uMartinXu uMartinXu Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we are creating a user-defined SCC for qat workload using qatlib. We should be very careful to create, maintain, and use a user-defined SCC see https://docs.openshift.com/container-platform/4.13/authentication/managing-security-context-constraints.html#security-context-constraints-creating_configuring-internal-oauth.
If we can't use the predefined SCC by OCP https://docs.openshift.com/container-platform/4.13/authentication/managing-security-context-constraints.html#default-sccs_configuring-internal-oauth
and have to create our own SCC. I suggest starting from predefined "restricted" or "restricted-v2" SCC and only adding the "IPC_LOCK" capability. To make the user-defined SCC more easily maintained, I suggest having a single yaml file to maintain it

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we are creating a user-defined SCC for qat workload using qatlib.

It's for all containers using QAT through VFIO (also DPDK's libs and DPDK test apps like crypto/compress-perf belong to this)

Copy link
Contributor Author

@vbedida79 vbedida79 Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we are creating a user-defined SCC for qat workload using qatlib. We should be very careful to create, I suggest starting from predefined "restricted" or "restricted-v2" SCC and only adding the "IPC_LOCK" capability. To make the user-defined SCC more easily maintained, I suggest having a single yaml file to maintain it

The pod needs to run as root, via restricted it wont work. Default capability with IPC_LOCK defaults to anyuid . The SCC here does not give access to file system and only executes as root with IPC_LOCK. Also, if other workloads need any other permissions, we just need to update this yaml to add it and in the job.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#35 is about making it possible to run containers as non-root if it helps to avoid custom SCCs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, tried this. For qatlib, it has a prereq to run as root https://github.com/intel/qatlib/blob/7429ee2b7c837137ed11959a3c2cc3729dc15739/INSTALL#L60. Will be following for gpu and sgx in 1.1.0 release

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For qatlib, it has a prereq to run as root

@vbedida79 it's not true. At least in our orchestration case we can run QAT as non-root just like with gpu/sgx

Copy link
Contributor Author

@vbedida79 vbedida79 Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without root, sample code exits

qaeMemInit started
dma_map_slab:200 VFIO_IOMMU_MAP_DMA failed va=7ffab563c000 iova=200000 size=200000 -- errno=12
ADF_UIO_PROXY err: adf_init_ring: unable to get ringbuf(v:(nil),p:(nil)) for rings in bank(0)
ADF_UIO_PROXY err: icp_adf_transCreateHandle: adf_init_ring failed

https://github.com/intel/qatlib/blob/7429ee2b7c837137ed11959a3c2cc3729dc15739/INSTALL#L60 Any other setup/permission you suggest to deploy with?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you wish to test it non-root, CRI-O must be configured with:

[crio.runtime]
device_ownership_from_security_context = true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, on the host node then. this might be it. what do you think @mregmi and @uMartinXu

@hershpa
Copy link
Contributor

hershpa commented Sep 12, 2023

LGTM!


RUN dnf -y update && \
dnf install -y gcc \
systemd-devel \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need systemd-devel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--enable-systemd=no and without systemd-devel is the right choice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vbedida79 can we remove the systemd-devel package and figure out whether we still can build it. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it, thanks

./autogen.sh && \
./configure \
--prefix=/usr \
--disable-fast-crc-in-assembler \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to disable-fast-crc-in-assembler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nasm is installed via rpmand not available via yum/dnf in ubi. used this option. as suggested here #117 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated back to nasm, as discussed. @mregmi @uMartinXu please review

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the justification to add it back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mregmi @uMartinXu mentioned possibility of performance issues with nasm removed

@@ -0,0 +1,62 @@
# Copyright (c) 2023 Intel Corporation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change the commit log to:

tests_l2: Add qatlib based Intel QAT sample workload

qatlib is built from source in https://github.com/intel/qatlib
The qatlib version is specified by QATLIB_VERSION: 23.08.0
cpa_sample_code shipped with qatlib is used to verify the QAT provisioning on OCP

Resource:
qat.intel.com/dc is used for QAT compression/decompression
qat.intel.com/cy is used for QAT asymmetric/asymmetric cryptography

serviceAccount: intel-qat uses the user defined SCC intel-qat-scc which enables the IPC_LOCK capability for Qat-lib based workload see #122

Signed-off-by: veenadhari.bedida@intel.com

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also add the detailed commit log in the git. The PR description can only be accessed from the online GitHub service. Correct me @mythi @vbedida79

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can update commit message. do we plan to do this for every PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please. :-)

[IPC_LOCK]
resources:
requests:
qat.intel.com/dc: '1'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we tried the case when we have more than one qat resource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. /dev/vfio will have 4 dev files instead of 2. since 2 of dc and 2 of cy resources are requested

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean have we ever tried to use multiple resources/VFs to do the dc or/and cy simultaneously in a single pod/container?

Copy link
Contributor Author

@vbedida79 vbedida79 Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes difference is what /dev/vfio would have. changed this job example, 2 cy and 2 dc. anything else to be checked here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provisioning of multiple resources works.
But we might need to figure out whether the multiple resources can actually be used by the test case to do dc or/and cy offloading to QAT device simultaneously in a single pod/container.
For this PR, I think it is good enough to merge.

@vbedida79 vbedida79 force-pushed the patch-290823-1 branch 2 times, most recently from fb21cbd to 20ae597 Compare September 13, 2023 21:01
qatlib is built from source in https://github.com/intel/qatlib
The qatlib version is specified by QATLIB_VERSION: 23.08.0
cpa_sample_code shipped with qatlib is used to verify the QAT provisioning on OCP
Resources:
qat.intel.com/dc is used for QAT compression/decompression
qat.intel.com/cy is used for QAT asymmetric/asymmetric cryptography
serviceAccount: intel-qat uses the user defined SCC intel-qat-scc which enables the IPC_LOCK capability for Qat-lib based workload see intel#122

Signed-off-by: vbedida79 <veenadhari.bedida@intel.com>
@uMartinXu uMartinXu merged commit 573e29e into intel:main Sep 13, 2023
vbedida79 added a commit to vbedida79/intel-technology-enabling-for-openshift that referenced this pull request Sep 13, 2023
Resolves intel#122
qatlib scc for IPC_LOCK in security/
qatlib needs to run as root
SCC based on OCP default restricted-v2 SCC, with root permissions
Used for intel#117

Signed-off-by: vbedida79 <veenadhari.bedida@intel.com>
vbedida79 added a commit to vbedida79/intel-technology-enabling-for-openshift that referenced this pull request Sep 13, 2023
Resolves intel#122
qatlib scc for IPC_LOCK in security/
qatlib needs to run as root
SCC based on OCP default restricted-v2 SCC, with root permissions
Used for intel#117

Signed-off-by: vbedida79 <veenadhari.bedida@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request qat QAT feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants