-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove arrow dependency #16640
Remove arrow dependency #16640
Conversation
This reverts commit 556c2b5.
Waiting on some final confirmation that NVIDIA/spark-rapids-jni#2357 works for the plugin and then this should be good to go. |
/merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, @vyasr. Approving, on behalf of CUDF JNI.
Contributes to rapidsai/build-planning#33. Proposes the following for `cuspatial` wheels: * add build and runtime dependencies on `libcudf` wheels * stop vendoring copies of `libcudf.so`, `libnvcomp.so`, `libnvcomp_bitcomp.so`, and `libnvcomp_gdeflate.so` - *(load `libcudf.so` dynamically at runtime instead)* And other related changes for development/CI: * combine all `pip install` calls into 1 in wheel-testing scripts - *like rapidsai/cudf#16575 - *to improve the chance that packaging issues are discovered in CI* * `dependencies.yaml` changes: - more use of YAML anchors = less duplication - use dedicated `depends_on_librmm` and `depends_on_libcudf` groups * explicitly pass a package type to `gha-tools` wheel uploading/downloading scripts ## Notes for Reviewers ### Benefits of these changes Unblocks CI in this repo (ref: #1444 (comment), #1441 (comment)). Reduces wheel sizes for `cuspatial` wheels by about 125MB 😁 | wheel | size (before) | size (this PR) | |:-----------:|-------------:|---------------:| | `cuspatial` | 146.0M | 21M | | `cuproj ` | 0.9M | 0.9M | |**TOTAL** | **146.9M** | **21.9M** | *NOTES: size = compressed, "before" = 2024-08-21 nightlies (c60bd4d), CUDA = 12, Python = 3.11* <details><summary>how I calculated those (click me)</summary> ```shell # note: 2024-08-21 because that was the most recent date with # successfully-built cuspatial nightlies # docker run \ --rm \ -v $(pwd):/opt/work:ro \ -w /opt/work \ --network host \ --env RAPIDS_NIGHTLY_DATE=2024-08-21 \ --env RAPIDS_NIGHTLY_SHA=c60bd4d \ --env RAPIDS_PR_NUMBER=1447 \ --env RAPIDS_PY_CUDA_SUFFIX=cu12 \ --env RAPIDS_REPOSITORY=rapidsai/cuspatial \ --env WHEEL_DIR_BEFORE=/tmp/wheels-before \ --env WHEEL_DIR_AFTER=/tmp/wheels-after \ -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.11 \ bash mkdir -p "${WHEEL_DIR_BEFORE}" mkdir -p "${WHEEL_DIR_AFTER}" py_projects=( cuspatial cuproj ) for project in "${py_projects[@]}"; do # before RAPIDS_BUILD_TYPE=nightly \ RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REF_NAME="branch-24.10" \ RAPIDS_SHA=${RAPIDS_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python "${WHEEL_DIR_BEFORE}" # after RAPIDS_BUILD_TYPE=pull-request \ RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REF_NAME="pull-request/${RAPIDS_PR_NUMBER}" \ rapids-download-wheels-from-s3 python "${WHEEL_DIR_AFTER}" done du -sh ${WHEEL_DIR_BEFORE}/* du -sh ${WHEEL_DIR_BEFORE} du -sh ${WHEEL_DIR_AFTER}/* du -sh ${WHEEL_DIR_AFTER} ``` </details> Reduces the amount of additional work required to start shipping `libcuspatial` wheels. ### Background This is part of ongoing work towards packaging `libcuspatial` as a wheel. relevant prior work: * packaging `libcudf` wheels: rapidsai/cudf#15483 * consolidating `pip install` calls in CI scripts for `cudf`: rapidsai/cudf#16575 * `cudf` dropping its Arrow library dependency: rapidsai/cudf#16640 ### How I tested this Confirmed in local builds and CI logs that `cudf` is being *found*, not *built*, in `cuspatial` builds. ```text -- CPM: Using local package cudf@24.10.0 ``` ([build link](https://github.com/rapidsai/cuspatial/actions/runs/10602971716/job/29386288614?pr=1447#step:9:23472)) Built `cuspatial` wheels locally and ran all the unit tests, without issue. # Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) - Matthew Roeschke (https://github.com/mroeschke) URL: #1447
Fixes #16678. Adds the boost-devel package to the Java CI Docker environment now that the Boost headers are not being picked up implicitly after libcudf dropped the Arrow dependency in #16640. libcudfjni still requires Arrow for now, and thus requires Boost headers. Authors: - Jason Lowe (https://github.com/jlowe) Approvers: - Alessandro Bellina (https://github.com/abellina) - Bradley Dice (https://github.com/bdice) URL: #16707
I still see |
No. We still use arrow in tests, examples, and the JNI, and that CMake code is used in all three. In all three cases we now build and statically link to Arrow as part of the build, though, so there is no longer any library dependency in those cases. The critical change in this PR is removing the dependency from libcudf and the cudf Python modules. |
There have been a number of changes to RAPIDS builds over the course of this release and not all changes were fully propagated to the devcontainers repo. This repo addresses the following: - As of rapidsai/kvikio#369, kvikio produces wheels, and rapidsai/kvikio#439 contains critical fixes that allow the kvikio Python wheel to use the C++ libkvikio wheel. In RAPIDS Python builds we have consistently removed support for the Python build triggering the C++ build as we have created C++ wheels since in both conda and pip environments we now expect the library to be found and we do not need to automatically support the more esoteric use case of someone turning off build isolation but not having the C++ library available (devs can handle this case themselves if they wish). As a result, once rapidsai/kvikio#466 is merged, the `FIND_KVIKIO_CPP` variable will be completely superfluous and we can remove that here. - As of rapidsai/cudf#16640 libcudf no longer links to libarrow and `USE_LIBARROW_FROM_PYARROW` is no longer used. - The libcudf and libcuspatial Python package builds in the devcontainer should (like all other Python packages) omit the CUDA version suffix. For that, they need to use the `rapids_build_backend_args`.
Description
This PR removes libarrow as a dependency of libcudf since we no longer use any of its APIs in our C++ code. The following places remain dependent on libarrow:
In all three cases above, we are now statically linking libarrow. We also always pull it in via CPM, which means that we never require libarrow to exist on the user's system anymore. Of the above three cases, we should expect the first two to persist indefinitely. The JNI could be updated to use nanoarrow instead if desired, but that is not critical since the primary benefit of removing libarrow as a direct dependency is to remove it as a constraint for package managers such as conda in environments where we must match the version of Arrow required by other dependencies.
pyarrow remains a dependency of the cudf Python packages. For now, this PR retains the tight pinning on 16.1 since we know that this version works. A future PR will loosen this pinning since we are no longer constrained to ABI-compatible versions and can support a range of pyarrow versions that support the necessary Python APIs (I believe pyarrow>=13 will work, but that remains to be tested).
Resolves #15193
Checklist