Expand and add more CUDA/HIP documentation #1309

msimberg · 2024-11-01T12:13:24Z

Adds documentation for classes that weren't yet documented and updates the existing documentation. Adds a section for pika/cuda.hpp to the API documentation.

Early stages, far from complete, but may require some discussion before continuing.

codacy-production · 2024-11-01T12:18:22Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ +0.01% (target: -1.00%)	✅ ∅ (target: 90.00%)

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`bfa0be5`)	18282	13802	75.50%
Head commit (`8af4738`)	18282 (+0)	13804 (+2)	75.51% (+0.01%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#1309)	0	0	∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

_{Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more}

msimberg · 2024-11-01T12:18:49Z

docs/api.rst

+:tocdepth: 3
+


I'm setting this explicitly to avoid having everything show up in the table of contents on the right hand side.

level 1 is basically only the page title "API reference"

level 2 adds the header sections (pika/execution.hpp etc.)

level 3 adds the functions/classes in each header

level 4 adds member functions of classes (or parameters of functions, only with sphinx-immaterial)

level 5 adds pre/post conditions of member functions (only with sphinx-immaterial)

I'm leaning towards 3 or 4 with a slight preference for 3, which is what I've currently chosen.

Stick to level 3, do not add too much information in table of contents.

docs/api.rst

msimberg · 2024-11-01T12:19:23Z

docs/api.rst

+TODO: Note that while cuda_pool gives direct access to streams and handles, the intended usage is to
+access them via the scheduler and sender adaptors.


msimberg · 2024-11-01T12:21:10Z

docs/api.rst

+.. doxygenclass:: pika::cuda::experimental::cuda_pool
+.. doxygenclass:: pika::cuda::experimental::cuda_stream
+.. doxygenclass:: pika::cuda::experimental::cublas_handle
+.. doxygenclass:: pika::cuda::experimental::locked_cublas_handle
+.. doxygenclass:: pika::cuda::experimental::cusolver_handle
+.. doxygenclass:: pika::cuda::experimental::locked_cusolver_handle


I'm at the moment not planning to provide examples for these individually for two reasons:

cuda_pool and friends are covered quite well in the overview example (i.e. create it and pass it to a cuda_scheduler)

most of these should generally not be used directly, but the streams and handles should be accessed with the sender adaptors below

Do you think this is ok (leaving out examples here)? If yes, perhaps I should move them to the bottom of the section?

Keep this, but explicitly state that these are not recommended to be used directly.

msimberg · 2024-11-01T12:24:31Z

docs/api.rst

+      :language: c++
+      :start-at: #include
+
+.. doxygenvariable:: pika::cuda::experimental::then_with_stream


This one is a bit weirdly documented at the moment, since we document the variable, but not the call operators of the then_with_stream_t class. I'm between having an example, which is usually clear enough on its own, and just documenting the call operators explicitly. Currently I refer e.g. to \p f as the callable passed to the adaptor, but the user has no information on how that is passed to the adaptor (not much of a problem for then_with_stream but a bigger problem for then_with_cublas since that also takes a pointer mode).

Should I include both an example and document the call operators? If we document the call operators should we do the same for drop_value, drop_operation_state, etc.?

Explore separately from this PR what documentation would look like with e.g. documentation for call operator on CPO type.

msimberg · 2024-11-01T12:24:45Z

examples/documentation/CMakeLists.txt

+    # cuda_overview_documentation # TODO
    drop_operation_state_documentation
    drop_value_documentation
    hello_world_documentation
    init_hpp_documentation
    split_tuple_documentation
+    # then_with_stream_documentation # TODO


To do. These currently don't compile.

msimberg · 2024-11-01T12:25:31Z

libs/pika/async_cuda/include/pika/async_cuda/cuda_pool.hpp

+    /// the original pool of streams. A moved-from pool can't be used, except to check if it is
+    /// valid with \ref valid().
+    ///
+    /// The pool is equality comparable and formattable.


Should we keep it simple and document this like above, or should we explicitly add operator== etc. to the documentation? I'm leaning towards keep it simple, but curious to hear what others think.

Keep it simple. Unless there's special behaviour for special member functions, just mention here that they exist/don't exist.

msimberg · 2024-11-01T12:26:12Z

libs/pika/async_cuda/include/pika/async_cuda/cuda_pool.hpp

+        /// \brief Move constructor.
        PIKA_NVCC_PRAGMA_HD_WARNING_DISABLE
        cuda_pool(cuda_pool&&) = default;
+        /// \brief Copy constructor.
        PIKA_NVCC_PRAGMA_HD_WARNING_DISABLE
        cuda_pool(cuda_pool const&) = default;
+        /// \brief Move assignment operator.
        PIKA_NVCC_PRAGMA_HD_WARNING_DISABLE
        cuda_pool& operator=(cuda_pool&&) = default;
+        /// \brief Copy assignment operator.


I'm not sure there's any value in documenting these like above. Should we just leave the docstrings out and say it's copyable/movable etc. and describe the semantics in the class docstring?

Remove these, unless there's special behaviour to document.

msimberg · 2024-11-01T12:27:33Z

libs/pika/async_cuda_base/include/pika/async_cuda_base/cublas_handle.hpp

@@ -27,6 +32,7 @@ namespace pika::cuda::experimental {
        static PIKA_EXPORT cublasHandle_t create_handle(int device, whip::stream_t stream);

    public:
+        /// TODO: How to best document constructor and other special member functions.
        PIKA_EXPORT cublas_handle();


To do. Document that this is default constructible and what the state of a default-constructed handle is.

Document this saying it's an invalid handle.

msimberg · 2024-11-01T12:28:50Z

libs/pika/async_cuda/include/pika/async_cuda/then_on_host.hpp

+    };
+
+    /// NOTE: This is not a customization of pika::execution::experimental::then.
+    /// It retains the cuda_scheduler execution context from the predecessor
+    /// sender, but does not run the continuation on a CUDA device. Instead, it
+    /// runs the continuation in the polling thread used by the cuda_scheduler on
+    /// the CPU. The continuation is run only after synchronizing all previous
+    /// events scheduled on the cuda_scheduler. Blocking in the callable given to
+    /// then_on_host blocks other work scheduled on cuda_scheduler from
+    /// completing. Heavier work should be transferred to a host scheduler as
+    /// soon as possible.
+    inline constexpr then_on_host_t then_on_host{};


I'm considering removing this function as we've so far not had a use for it, and the safer option is to explicitly transfer to a new task if one wants to run something on the host. What do you think about removing it?

Remove then_on_host.

libs/pika/async_cuda/include/pika/async_cuda/then_with_stream.hpp

msimberg · 2024-11-01T12:29:40Z

libs/pika/async_cuda_base/include/pika/async_cuda_base/cuda_stream.hpp

+        /// \brief Get the priority of the stream.
+        ///
+        /// \return the priority of the stream.


Should we just do:

Suggested change

/// \brief Get the priority of the stream.

///

/// \return the priority of the stream.

/// \brief Get the priority of the stream.

for simple functions like these?

Simplify for getters and setters.

Document cuda_pool, cuda_scheduler, cuda_stream, cublas_handle, cusolver_handle, as well as expose these with CUDA sender adaptors in the documentation. Adds a high-level example of using CUDA functionality.

msimberg · 2024-11-01T13:12:08Z

docs/api.rst

@@ -104,3 +105,98 @@ The ``pika/execution.hpp`` header provides functionality related to ``std::execu
 .. literalinclude:: ../examples/documentation/when_all_vector_documentation.cpp
   :language: c++
   :start-at: #include
+


The API reference page is starting to get quite long. We could consider splitting it up to have one header per page. I'm still slightly in favour of keeping it on one page for Ctrl-F-ability and clicking on references not changing the page, but wouldn't object to splitting it up either.

Revisit this once page is longer. Consider splitting up the API page by topics/high-level categories rather than one page per header.

msimberg self-assigned this Nov 1, 2024