Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hide visibility of non-public symbols #1644

Merged
merged 16 commits into from
Aug 15, 2024

Conversation

jameslamb
Copy link
Member

@jameslamb jameslamb commented Aug 9, 2024

Description

Fixes #1645

Contributes to rapidsai/build-planning#33

Similar to rapidsai/cudf#15982

Proposes more tightly controlling the visibility of symbols in the shared libraries produces for the rmm Python library, via the following:

  • compiling with -fvisibility=hidden by default
  • marking intended-to-be-public parts of rmm (everything in the rmm:: namespace) with __attribute__((visibility("default")))

Benefits of this change

Reduces the risk of symbol conflicts when rmm is used alongside other libraries. For example, see this case in cudf where the spdlog:: symbols in rmm are conflicting with the spdlog:: symbols in nvcomp: rapidsai/cudf#15483 (comment)

Reduces library size by a bit (around 0.3 MB uncompressed), by reducing the size of symbol tables in DSOs.

Notes for Reviewers

This is at the very edge of my C++ knowledge, apologies in advance if I've missed something obvious 😬

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

How I tested this

Re-used the wheels produced from this PR over in cudf and saw cudf's tests succeed where previously the had failed because of symbol conflicts: rapidsai/cudf#15483 (comment)

Also tested this locally, read on...

On an x86_64 machine with CUDA 12.2, checked out the latest 24.10 of rmm and the branch of cudf from rapidsai/cudf#15483 (which is up to date with the latest 24.10 of cudf).

Built librmm, rmm, libcudf, and cudf wheels from source, in the same CUDA 12, Python 3.11 image used for wheel building CI.

how I did that (click me)

Created a script, build-all.sh, with the following.

#!/bin/bash

set -e -u -o pipefail

mkdir -p "${WHEEL_DIR}"
rm -r ${WHEEL_DIR}/*.whl || true
echo "" > "${PIP_CONSTRAINT}"

rm -rf /usr/local/src/rmm/python/*/{build,dist,final_dist}
rm -rf /usr/local/src/cudf/python/*/{build,dist,final_dist}

# move sources to /tmp, so file permissions don't get messed up
cp -R /usr/local/src/cudf /tmp/cudf
cp -R /usr/local/src/rmm /tmp/rmm

# install wheel
python -m pip install wheel

# build rmm C++ wheel
echo ""
echo "--- building rmm C++ wheel ---"
echo ""

cd /tmp/rmm/python/librmm
python -m pip wheel . -w ./dist -vvv --no-deps --disable-pip-version-check
python -m wheel tags --platform any ./dist/librmm*.whl --remove
mv ./dist/librmm*.whl ${WHEEL_DIR}/

echo ""
echo "--- done done building rmm C++ wheel ---"
echo ""

# build rmm Python wheel

# ensure 'rmm' wheel builds always use the 'librmm' just built in the same CI run
#
# using env variable PIP_CONSTRAINT is necessary to ensure the constraints
# are used when created the isolated build environment
echo "librmm-cu12 @ file://$(echo ${WHEEL_DIR}/librmm_cu12*.whl)" >> "${PIP_CONSTRAINT}"

echo ""
echo "--- building rmm Python wheel ---"
echo ""

cd /tmp/rmm/python/rmm
python -m pip wheel . -w ./dist -vvv --no-deps --disable-pip-version-check
mkdir -p ./final_dist
python -m auditwheel repair -w ./final_dist ./dist/rmm_*.whl
mv ./final_dist/rmm_*.whl ${WHEEL_DIR}/

echo "rmm-cu12 @ file://$(echo ${WHEEL_DIR}/rmm_cu12*.whl)" >> "${PIP_CONSTRAINT}"

echo ""
echo "--- done building rmm Python wheel ---"
echo ""

# build cudf C++ wheel
echo ""
echo "--- building cudf C++ wheel ---"
echo ""

cd /tmp/cudf/python/libcudf

mkdir -p ./final_dist
python -m pip wheel . -w ./dist -vvv --no-deps --disable-pip-version-check
python -m auditwheel repair \
    --exclude libarrow.so.1601 \
    -w ./final_dist \
    ./dist/libcudf_*.whl

mv ./final_dist/libcudf_*.whl ${WHEEL_DIR}/

echo "libcudf-cu12 @ file://$(echo ${WHEEL_DIR}/libcudf_cu12*.whl)" >> "${PIP_CONSTRAINT}"

echo ""
echo "--- done building cudf C++ wheel ---"
echo ""

# build cudf Python wheel
echo ""
echo "--- building cudf Python wheel ---"
echo ""

cd /tmp/cudf/python/cudf
export SKBUILD_CMAKE_ARGS="-DUSE_LIBARROW_FROM_PYARROW=ON"

mkdir -p ./final_dist
python -m pip wheel . -w ./dist -vvv --no-deps --disable-pip-version-check
python -m auditwheel repair \
    --exclude libcudf.so \
    --exclude libarrow.so.1601 \
    -w ./final_dist \
    ./dist/cudf_*.whl

mv ./final_dist/cudf_*.whl ${WHEEL_DIR}/

echo "cudf-cu12 @ file://$(echo ${WHEEL_DIR}/cudf_cu12*.whl)" >> "${PIP_CONSTRAINT}"

echo ""
echo "--- done building cudf Python wheel ---"
echo ""

Ran it like this:

docker run \
    --rm \
    -v $(pwd):/usr/local/src \
    -w /usr/local/src \
    --env PIP_CONSTRAINT=/usr/local/src/build-constraints.txt \
    --env WHEEL_DIR=/usr/local/src/dist \
    -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.11 \
    bash

# build all the wheels
./build-all.sh

Installed the wheels produced by that script, using one of the testing images used across RAPIDS CI.

how I did that (click me)
docker run \
    --rm \
    --gpus 1 \
    --env WHEEL_DIR=/usr/local/src/dist \
    -v $(pwd):/usr/local/src \
    -w /usr/local/src \
    -it rapidsai/ci-conda:cuda12.2.2-ubuntu22.04-py3.11 \
    bash

# install everything
pip install ${WHEEL_DIR}/*.whl
# run rmm's Python tests
pip install pytest
cd /usr/local/src/rmm
ci/run_pytests.sh

# try to reproduce the bug
cat > ./test2.py <<EOF
import cudf.pandas
cudf.pandas.install()
EOF

CUDF_PANDAS_RMM_MODE=pool python ./test2.py

Saw that fail with exactly the same symbol-conflict issue discussed in rapidsai/cudf#15483 (comment)..

Repeated that process with the changes you see in this PR, and observed:

  • all of rmm's Python tests passing
  • no more symbol conflict issue in cudf 😁

Also inspected the symbol tables with readelf. For example, here are the counts of symbols by visibility in memory_resource.cpython-311-x86_64-linux-gnu.so:

# branch-24.10
GLOBAL: 431
WEAK: 1220
LOCAL: 1

# this PR
GLOBAL: 425
WEAK: 75
LOCAL: 1

It also helped reduce the sizes a bit

# (these are the uncompressed sizes)

# branch-24.10
memory_resource.*.so : 1.5  MB
rmm_cu12*.whl: 5.4 MB

# this PR
memory_resource.*.so : 1.3  MB
rmm_cu12*.whl: 5.1 MB
how I did that (click me)

Unzipped the rmm wheel.

rm -rf ./delete-me
mkdir -p ./delete-me
unzip -d ./delete-me ./dist/rmm_cu12*.whl

Checked the symbol tables with readelf

readelf --symbols --wide \
    ./delete-me/rmm/_lib/memory_resource.cpython-311-x86_64-linux-gnu.so \
| c++filt \
> syms.txt

echo "GLOBAL: $(grep --count -E ' GLOBAL ' < ./syms.txt)"
echo "WEAK: $(grep --count -E '  WEAK ' < ./syms.txt)"
echo "LOCAL: $(grep --count -E '  LOCAL ' < ./syms.txt)"

Checked the sizes with pydistcheck

# and to get the sizes
pip install pydistcheck
pydistcheck --inspect ./dist/rmm_cu12*.whl

Where did RMM_EXPORT come from?

It's been in rmm for a few years:

// Macros used for defining symbol visibility, only GLIBC is supported
#if (defined(__GNUC__) && !defined(__MINGW32__) && !defined(__MINGW64__))
#define RMM_EXPORT __attribute__((visibility("default")))
#define RMM_HIDDEN __attribute__((visibility("hidden")))
#else
#define RMM_EXPORT
#define RMM_HIDDEN
#endif

But only used in one place, from #833.

References

I found these helpful:

@jameslamb jameslamb added breaking Breaking change improvement Improvement / enhancement to an existing function labels Aug 9, 2024
@jameslamb jameslamb changed the title WIP: Hide visibility of non-public symbols. WIP: Hide visibility of non-public symbols Aug 9, 2024
@github-actions github-actions bot added CMake Python Related to RMM Python API cpp Pertains to C++ code labels Aug 9, 2024
jameslamb added a commit to msarahan/cudf that referenced this pull request Aug 9, 2024
@jameslamb jameslamb changed the title WIP: Hide visibility of non-public symbols Hide visibility of non-public symbols Aug 9, 2024
@jameslamb jameslamb marked this pull request as ready for review August 9, 2024 18:31
@jameslamb jameslamb requested review from a team as code owners August 9, 2024 18:31
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the best of my knowledge, this is good! If you removed the RMM_EXPORT from a header, would the rmm Python builds or tests fail?

@jameslamb
Copy link
Member Author

If you removed the RMM_EXPORT from a header, would the rmm Python builds or tests fail?

Good question!

I just tried removing the RMM_EXPORT from include/rmm/cuda_stream.hpp and repeated the testing from the PR description... building succeeded, ci/run_pytests.sh succeeded.

Then tried removing ALL of the RMM_EXPORT macros. Same thing... building succeeded, ci/run_pytests.sh succeeded.

Maybe this is because the Python package only calls into the DSOs through public symbols generated by Cython, and those are not affected by this CMake change?

I'm going to try pushing a commit with those removals here, let's see what other CI jobs say.

@bdice
Copy link
Contributor

bdice commented Aug 9, 2024

My hope is that we can test the accuracy of our exports within RMM's CI rather than relying on cuDF to tell RMM that it broke.

@jameslamb
Copy link
Member Author

My hope is that we can test the accuracy of our exports within RMM's CI rather than relying on cuDF to tell RMM that it broke.

Right, totally understand and I agree.

Unfortunately it looks like even removing those, everything passed: https://github.com/rapidsai/rmm/actions/runs/10326031645?pr=1644. So I just pushed 1c39756 reverting this PR back to its original state.

Something about CI will have to change here if we want feedback on the exports working correctly within rmm's CI.

This is way at the edge of my understanding, so the only thing I can think of is to dump the symbol table for every DSO and pass it through some text processing (e.g. we shouldn't see any WEAK / GLOBAL spdlog:: symbols remaining in any DSOs, and I can envision how to check that with grep + the output of readelf --symbols).

I'm certain someone else will have a better idea than that for how to express the behavior we want in CI here. cc @robertmaynard @KyleFromNVIDIA

@harrism
Copy link
Member

harrism commented Aug 12, 2024

Can you please open an RMM issue for this? Thanks.

@harrism
Copy link
Member

harrism commented Aug 12, 2024

As for testing, could add a separate test that links spdlog and RMM and make sure it compiles without symbol conflicts?

@vyasr
Copy link
Contributor

vyasr commented Aug 12, 2024

I agree that we should use a per-target approach to setting the visibility. I would prefer if we use an explicit approach so that we don't effect other callers of rapids_cython_create_modules.

That works, so instead of changing the default behavior, we'd add an option DEFAULT_VISIBILITY or so and if it isn't set then don't specify any property on the target.

It isn't clear to me what kind of C++ types cross shared libraries boundaries via Python. If we have C++ objects being shunted via Python between rmm and cudf, or rmm and cuml we will need the RMM symbols to be public ( which they currently are without this patch ) so that we have consistent types across multiple libraries.

We definitely need to be able to hand around e.g. rmm device buffers across library boundaries. I'm not sure how that's reflected in terms of structs (classes) at the DSO level, though. Does visibility have any impact on whether they're safe to hand around? Structs don't actually have any corresponding symbols, so there's nothing to check there. As long as they are ABI-stable between the different libraries, isn't that sufficient? Basic library inspection tools won't even give us enough information to study the stability of the objects since we can't check on things like member ordering/alignment or vtable contents.

@jameslamb jameslamb requested a review from a team as a code owner August 12, 2024 23:39
@github-actions github-actions bot added ci and removed cpp Pertains to C++ code labels Aug 12, 2024
@jameslamb
Copy link
Member Author

That works, so instead of changing the default behavior, we'd add an option DEFAULT_VISIBILITY or so and if it isn't set then don't specify any property on the target.

I have a slightly different proposal that'd be in the spirit of this without requiring visibility-specific arguments in the signature of rapids_cython_create_modules(). Had been waiting to push it until I'd tested locally. But that local testing looked ok, so I just pushed some commits with it.

rapids_cython_create_modules() populates a variable RAPIDS_CYTHON_CREATED_TARGETS with a list of the targets it created (code link).

So without any change changes to rapids-cmake, it's possible to do this in a target-centered way specifically for the rmm Cython stuff.

Like this...

foreach(_cython_target IN LISTS RAPIDS_CYTHON_CREATED_TARGETS)
    set_target_properties(
        ${_cython_target}
        PROPERTIES
            C_VISIBILITY_PRESET hidden
            CXX_VISIBILITY_PRESET hidden
    )
endforeach()

... after each call to rapids_cython_create_modules() .

Doing it that way here (and in other similar projects) would allow us to test this behavior without expanding rapids-cmake's public API. So if it turns out that this is wrong or insufficient, it could be reverted without breaking changes to rapids-cmake.

@harrism
Copy link
Member

harrism commented Aug 13, 2024

Can you please open an RMM issue for this? Thanks.

Is there a reason rapidsai/build-planning#33 isn't sufficient? The purpose of the build-planning repo is for precisely this sort of cross-repo task tracking that is easier to discuss in a centralized location.

Yes. It doesn't show up in the issues for an RMM release. And I don't get a notification showing me the work needed/being done on the library. I choose to follow @jarmak-nv 's suggested process of having an issue for all work done on RMM. It helps me track, plan, and report.

@robertmaynard
Copy link
Contributor

We definitely need to be able to hand around e.g. rmm device buffers across library boundaries. I'm not sure how that's reflected in terms of structs (classes) at the DSO level, though. Does visibility have any impact on whether they're safe to hand around? Structs don't actually have any corresponding symbols, so there's nothing to check there. As long as they are ABI-stable between the different libraries, isn't that sufficient?

If they are structs and not objects with vtables you are safe. The RTTI for hidden symbols can have the DSO embedded into it and therefore isn't safe across boundaries as you won't be able to dynamic_cast it the correct type. This primarily happens with libc++ as it has more aggressive conformance around RTTI casting unlike libstdc++.

@jameslamb
Copy link
Member Author

I just tested the cudf C++ wheels PR using artifacts produced from CI here and confirmed that this set of changes also resolves the symbol conflict issues.

ref: rapidsai/cudf#15483 (comment)

I've also proposed a new testing script implementing the idea from discussion above to try to add a CI-time check on symbol visibility. It shows up like this in wheel build logs:

checking exported symbols in '/tmp/tmp.6uVuClfluU/rmm/_cuda/stream.cpython-39-x86_64-linux-gnu.so'
symbol counts by type
  * GLOBAL: 305
  * WEAK: 5
  * LOCAL: 1
checking for 'fmt::' symbols...
checking for 'spdlog::' symbols...
No symbol visibility issues found

checking exported symbols in '/tmp/tmp.6uVuClfluU/rmm/_lib/memory_resource.cpython-39-x86_64-linux-gnu.so'
symbol counts by type
  * GLOBAL: 420
  * WEAK: 28
  * LOCAL: 1
checking for 'fmt::' symbols...
checking for 'spdlog::' symbols...
No symbol visibility issues found
...

(build link)

Tested that locally and found that it did successfully fail when I omitted the visibility attributes from some targets. I've hard-coded fmt:: and spdlog:: specifically into there... not sure if there's a more generic way to write that. But this is an improvement over the current state and at least offers some protection against the specific source of conflicts this PR sought to address.

I think this is ready for another review.

@robertmaynard
Copy link
Contributor

That works, so instead of changing the default behavior, we'd add an option DEFAULT_VISIBILITY or so and if it isn't set then don't specify any property on the target.

I have a slightly different proposal that'd be in the spirit of this without requiring visibility-specific arguments in the signature of rapids_cython_create_modules(). Had been waiting to push it until I'd tested locally. But that local testing looked ok, so I just pushed some commits with it.

rapids_cython_create_modules() populates a variable RAPIDS_CYTHON_CREATED_TARGETS with a list of the targets it created (code link).

So without any change changes to rapids-cmake, it's possible to do this in a target-centered way specifically for the rmm Cython stuff.

Like this...

foreach(_cython_target IN LISTS RAPIDS_CYTHON_CREATED_TARGETS)
    set_target_properties(
        ${_cython_target}
        PROPERTIES
            C_VISIBILITY_PRESET hidden
            CXX_VISIBILITY_PRESET hidden
    )
endforeach()

... after each call to rapids_cython_create_modules() .

Doing it that way here (and in other similar projects) would allow us to test this behavior without expanding rapids-cmake's public API. So if it turns out that this is wrong or insufficient, it could be reverted without breaking changes to rapids-cmake.

@jameslamb Could you open a rapids-cmake issue to track this feature request? I happy with using the current foreach approach in rmm for now so we can get the libcudf wheel PR merged and we can iterate on a full design of the API in parallel/after.

@jameslamb
Copy link
Member Author

Could you open a rapids-cmake issue to track this feature request?

Sure! here you go: rapidsai/rapids-cmake#672

happy with using the current foreach approach in rmm for now so we can get the libcudf wheel PR merged and we can iterate on a full design of the API in parallel/after.

Thanks! I agree.

@vyasr
Copy link
Contributor

vyasr commented Aug 14, 2024

Yes. It doesn't show up in the issues for an RMM release. And I don't get a notification showing me the work needed/being done on the library. I choose to follow @jarmak-nv 's suggested process of having an issue for all work done on RMM. It helps me track, plan, and report.

When we created build-planning we (including you and @jarmak-nv) specifically discussed this use case as one where the fragmentation associated with creating per-repo issues wasn't worth the cost. Without a centralized point of discussion (e.g. a build-planning issue) we end up with a comment on a cugraph issue pointing to a solution discussed in a combination of posts on a cuml issue and a raft PR (not an exaggeration, this kind of cross-repo discussion is fairly common without a centralized place for discussion). If an rmm issue is a must-have for you, could we compromise on keeping those issues as near-empty issues that just link to a build-planning issue? I realize that creating the issues on other repos isn't hard, especially with rapids-reviser, but that doesn't address the IMO much larger concern of fragmentation.

@vyasr
Copy link
Contributor

vyasr commented Aug 14, 2024

rapids_cython_create_modules() populates a variable RAPIDS_CYTHON_CREATED_TARGETS with a list of the targets it created (code link).

I'm fine with doing it this way for now, but I think that in the long run we should upstream the functionality. The purpose of RAPIDS_CYTHON_CREATED_TARGETS is to allow projects to make modifications that are specific to that project, but the visibility changes we're proposing here are presumably something that is going to be generally desirable in the long run. It sounds like that's what @robertmaynard is proposing as well, so I'm good with moving this PR forward in parallel with the rapids-cmake changes.

If they are structs and not objects with vtables you are safe. The RTTI for hidden symbols can have the DSO embedded into it and therefore isn't safe across boundaries as you won't be able to dynamic_cast it the correct type. This primarily happens with libc++ as it has more aggressive conformance around RTTI casting unlike libstdc++.

Hmm yeah I thought briefly about the vtables but didn't get far enough as to consider how they would be affected by crossing ABI boundaries. How is this made safe in general? Are vtables always embedded as unique symbols so that they can be deduplicated across DSOs? What about class methods? If rmm symbols corresponding to its classes are made public, and if one of those classes has some virtual function, could you end up with the same function defined in multiple DSOs (say libcuml and libcuvs) such that the two end up causing runtime lookup conflicts due to the symbols being identical?

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small suggestions, but nothing blocking. Address as you see fit. I discussed some things with Robert offline and am happy with the current state, recognizing that we may have to make more changes in the future.

ci/build_wheel_python.sh Outdated Show resolved Hide resolved
ci/check_symbols.sh Show resolved Hide resolved
ci/check_symbols.sh Show resolved Hide resolved
python/rmm/rmm/_cuda/CMakeLists.txt Show resolved Hide resolved
python/rmm/rmm/_lib/CMakeLists.txt Show resolved Hide resolved
@jameslamb
Copy link
Member Author

@harrism could you please review one more time?

I significantly changed the approach after your first approval, don't want you to feel like I'm sneaking something in here.

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better. I like that it's not touching the C++ code.

@jameslamb
Copy link
Member Author

Ok thanks for the reviews! I'm going to merge this so we can continue with rapidsai/cudf#15483.

@ me any time if something related to this breaks.

@jameslamb
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 43d01db into rapidsai:branch-24.10 Aug 15, 2024
49 checks passed
@jameslamb jameslamb deleted the symbol-visibility branch August 15, 2024 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change ci CMake improvement Improvement / enhancement to an existing function Python Related to RMM Python API
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA] limit symbol visibility in DSOs in the Python package
6 participants