Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand and add more CUDA/HIP documentation #1309

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

msimberg
Copy link
Contributor

@msimberg msimberg commented Nov 1, 2024

Adds documentation for classes that weren't yet documented and updates the existing documentation. Adds a section for pika/cuda.hpp to the API documentation.

Early stages, far from complete, but may require some discussion before continuing.

@msimberg msimberg self-assigned this Nov 1, 2024
Copy link

codacy-production bot commented Nov 1, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.01% (target: -1.00%) (target: 90.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (bfa0be5) 18282 13802 75.50%
Head commit (8af4738) 18282 (+0) 13804 (+2) 75.51% (+0.01%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#1309) 0 0 ∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

Comment on lines +8 to +9
:tocdepth: 3

Copy link
Contributor Author

@msimberg msimberg Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm setting this explicitly to avoid having everything show up in the table of contents on the right hand side.

  • level 1 is basically only the page title "API reference"
  • level 2 adds the header sections (pika/execution.hpp etc.)
  • level 3 adds the functions/classes in each header
  • level 4 adds member functions of classes (or parameters of functions, only with sphinx-immaterial)
  • level 5 adds pre/post conditions of member functions (only with sphinx-immaterial)

I'm leaning towards 3 or 4 with a slight preference for 3, which is what I've currently chosen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stick to level 3, do not add too much information in table of contents.

docs/api.rst Show resolved Hide resolved
Comment on lines +151 to +152
TODO: Note that while cuda_pool gives direct access to streams and handles, the intended usage is to
access them via the scheduler and sender adaptors.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do.

Comment on lines +154 to +159
.. doxygenclass:: pika::cuda::experimental::cuda_pool
.. doxygenclass:: pika::cuda::experimental::cuda_stream
.. doxygenclass:: pika::cuda::experimental::cublas_handle
.. doxygenclass:: pika::cuda::experimental::locked_cublas_handle
.. doxygenclass:: pika::cuda::experimental::cusolver_handle
.. doxygenclass:: pika::cuda::experimental::locked_cusolver_handle
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm at the moment not planning to provide examples for these individually for two reasons:

  • cuda_pool and friends are covered quite well in the overview example (i.e. create it and pass it to a cuda_scheduler)
  • most of these should generally not be used directly, but the streams and handles should be accessed with the sender adaptors below

Do you think this is ok (leaving out examples here)? If yes, perhaps I should move them to the bottom of the section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this, but explicitly state that these are not recommended to be used directly.

:language: c++
:start-at: #include

.. doxygenvariable:: pika::cuda::experimental::then_with_stream
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a bit weirdly documented at the moment, since we document the variable, but not the call operators of the then_with_stream_t class. I'm between having an example, which is usually clear enough on its own, and just documenting the call operators explicitly. Currently I refer e.g. to \p f as the callable passed to the adaptor, but the user has no information on how that is passed to the adaptor (not much of a problem for then_with_stream but a bigger problem for then_with_cublas since that also takes a pointer mode).

Should I include both an example and document the call operators? If we document the call operators should we do the same for drop_value, drop_operation_state, etc.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explore separately from this PR what documentation would look like with e.g. documentation for call operator on CPO type.

Comment on lines +8 to +14
# cuda_overview_documentation # TODO
drop_operation_state_documentation
drop_value_documentation
hello_world_documentation
init_hpp_documentation
split_tuple_documentation
# then_with_stream_documentation # TODO
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do. These currently don't compile.

/// the original pool of streams. A moved-from pool can't be used, except to check if it is
/// valid with \ref valid().
///
/// The pool is equality comparable and formattable.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep it simple and document this like above, or should we explicitly add operator== etc. to the documentation? I'm leaning towards keep it simple, but curious to hear what others think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it simple. Unless there's special behaviour for special member functions, just mention here that they exist/don't exist.

Comment on lines +174 to +183
/// \brief Move constructor.
PIKA_NVCC_PRAGMA_HD_WARNING_DISABLE
cuda_pool(cuda_pool&&) = default;
/// \brief Copy constructor.
PIKA_NVCC_PRAGMA_HD_WARNING_DISABLE
cuda_pool(cuda_pool const&) = default;
/// \brief Move assignment operator.
PIKA_NVCC_PRAGMA_HD_WARNING_DISABLE
cuda_pool& operator=(cuda_pool&&) = default;
/// \brief Copy assignment operator.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there's any value in documenting these like above. Should we just leave the docstrings out and say it's copyable/movable etc. and describe the semantics in the class docstring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these, unless there's special behaviour to document.

@@ -27,6 +32,7 @@ namespace pika::cuda::experimental {
static PIKA_EXPORT cublasHandle_t create_handle(int device, whip::stream_t stream);

public:
/// TODO: How to best document constructor and other special member functions.
PIKA_EXPORT cublas_handle();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do. Document that this is default constructible and what the state of a default-constructed handle is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document this saying it's an invalid handle.

Comment on lines +238 to +249
};

/// NOTE: This is not a customization of pika::execution::experimental::then.
/// It retains the cuda_scheduler execution context from the predecessor
/// sender, but does not run the continuation on a CUDA device. Instead, it
/// runs the continuation in the polling thread used by the cuda_scheduler on
/// the CPU. The continuation is run only after synchronizing all previous
/// events scheduled on the cuda_scheduler. Blocking in the callable given to
/// then_on_host blocks other work scheduled on cuda_scheduler from
/// completing. Heavier work should be transferred to a host scheduler as
/// soon as possible.
inline constexpr then_on_host_t then_on_host{};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering removing this function as we've so far not had a use for it, and the safer option is to explicitly transfer to a new task if one wants to run something on the host. What do you think about removing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove then_on_host.

Comment on lines +74 to +76
/// \brief Get the priority of the stream.
///
/// \return the priority of the stream.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just do:

Suggested change
/// \brief Get the priority of the stream.
///
/// \return the priority of the stream.
/// \brief Get the priority of the stream.

for simple functions like these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify for getters and setters.

Document cuda_pool, cuda_scheduler, cuda_stream, cublas_handle, cusolver_handle, as well as expose
these with CUDA sender adaptors in the documentation. Adds a high-level example of using CUDA functionality.
@@ -104,3 +105,98 @@ The ``pika/execution.hpp`` header provides functionality related to ``std::execu
.. literalinclude:: ../examples/documentation/when_all_vector_documentation.cpp
:language: c++
:start-at: #include

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API reference page is starting to get quite long. We could consider splitting it up to have one header per page. I'm still slightly in favour of keeping it on one page for Ctrl-F-ability and clicking on references not changing the page, but wouldn't object to splitting it up either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revisit this once page is longer. Consider splitting up the API page by topics/high-level categories rather than one page per header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

1 participant