-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand and add more CUDA/HIP documentation #1309
base: main
Are you sure you want to change the base?
Conversation
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesCodacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more |
:tocdepth: 3 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm setting this explicitly to avoid having everything show up in the table of contents on the right hand side.
- level 1 is basically only the page title "API reference"
- level 2 adds the header sections (
pika/execution.hpp
etc.) - level 3 adds the functions/classes in each header
- level 4 adds member functions of classes (or parameters of functions, only with sphinx-immaterial)
- level 5 adds pre/post conditions of member functions (only with sphinx-immaterial)
I'm leaning towards 3 or 4 with a slight preference for 3, which is what I've currently chosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stick to level 3, do not add too much information in table of contents.
TODO: Note that while cuda_pool gives direct access to streams and handles, the intended usage is to | ||
access them via the scheduler and sender adaptors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do.
.. doxygenclass:: pika::cuda::experimental::cuda_pool | ||
.. doxygenclass:: pika::cuda::experimental::cuda_stream | ||
.. doxygenclass:: pika::cuda::experimental::cublas_handle | ||
.. doxygenclass:: pika::cuda::experimental::locked_cublas_handle | ||
.. doxygenclass:: pika::cuda::experimental::cusolver_handle | ||
.. doxygenclass:: pika::cuda::experimental::locked_cusolver_handle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm at the moment not planning to provide examples for these individually for two reasons:
cuda_pool
and friends are covered quite well in the overview example (i.e. create it and pass it to acuda_scheduler
)- most of these should generally not be used directly, but the streams and handles should be accessed with the sender adaptors below
Do you think this is ok (leaving out examples here)? If yes, perhaps I should move them to the bottom of the section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep this, but explicitly state that these are not recommended to be used directly.
:language: c++ | ||
:start-at: #include | ||
|
||
.. doxygenvariable:: pika::cuda::experimental::then_with_stream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is a bit weirdly documented at the moment, since we document the variable, but not the call operators of the then_with_stream_t
class. I'm between having an example, which is usually clear enough on its own, and just documenting the call operators explicitly. Currently I refer e.g. to \p f
as the callable passed to the adaptor, but the user has no information on how that is passed to the adaptor (not much of a problem for then_with_stream
but a bigger problem for then_with_cublas
since that also takes a pointer mode).
Should I include both an example and document the call operators? If we document the call operators should we do the same for drop_value
, drop_operation_state
, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explore separately from this PR what documentation would look like with e.g. documentation for call operator on CPO type.
# cuda_overview_documentation # TODO | ||
drop_operation_state_documentation | ||
drop_value_documentation | ||
hello_world_documentation | ||
init_hpp_documentation | ||
split_tuple_documentation | ||
# then_with_stream_documentation # TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do. These currently don't compile.
/// the original pool of streams. A moved-from pool can't be used, except to check if it is | ||
/// valid with \ref valid(). | ||
/// | ||
/// The pool is equality comparable and formattable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep it simple and document this like above, or should we explicitly add operator==
etc. to the documentation? I'm leaning towards keep it simple, but curious to hear what others think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep it simple. Unless there's special behaviour for special member functions, just mention here that they exist/don't exist.
/// \brief Move constructor. | ||
PIKA_NVCC_PRAGMA_HD_WARNING_DISABLE | ||
cuda_pool(cuda_pool&&) = default; | ||
/// \brief Copy constructor. | ||
PIKA_NVCC_PRAGMA_HD_WARNING_DISABLE | ||
cuda_pool(cuda_pool const&) = default; | ||
/// \brief Move assignment operator. | ||
PIKA_NVCC_PRAGMA_HD_WARNING_DISABLE | ||
cuda_pool& operator=(cuda_pool&&) = default; | ||
/// \brief Copy assignment operator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure there's any value in documenting these like above. Should we just leave the docstrings out and say it's copyable/movable etc. and describe the semantics in the class docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these, unless there's special behaviour to document.
@@ -27,6 +32,7 @@ namespace pika::cuda::experimental { | |||
static PIKA_EXPORT cublasHandle_t create_handle(int device, whip::stream_t stream); | |||
|
|||
public: | |||
/// TODO: How to best document constructor and other special member functions. | |||
PIKA_EXPORT cublas_handle(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do. Document that this is default constructible and what the state of a default-constructed handle is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document this saying it's an invalid handle.
}; | ||
|
||
/// NOTE: This is not a customization of pika::execution::experimental::then. | ||
/// It retains the cuda_scheduler execution context from the predecessor | ||
/// sender, but does not run the continuation on a CUDA device. Instead, it | ||
/// runs the continuation in the polling thread used by the cuda_scheduler on | ||
/// the CPU. The continuation is run only after synchronizing all previous | ||
/// events scheduled on the cuda_scheduler. Blocking in the callable given to | ||
/// then_on_host blocks other work scheduled on cuda_scheduler from | ||
/// completing. Heavier work should be transferred to a host scheduler as | ||
/// soon as possible. | ||
inline constexpr then_on_host_t then_on_host{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm considering removing this function as we've so far not had a use for it, and the safer option is to explicitly transfer to a new task if one wants to run something on the host. What do you think about removing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove then_on_host
.
/// \brief Get the priority of the stream. | ||
/// | ||
/// \return the priority of the stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just do:
/// \brief Get the priority of the stream. | |
/// | |
/// \return the priority of the stream. | |
/// \brief Get the priority of the stream. |
for simple functions like these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify for getters and setters.
Document cuda_pool, cuda_scheduler, cuda_stream, cublas_handle, cusolver_handle, as well as expose these with CUDA sender adaptors in the documentation. Adds a high-level example of using CUDA functionality.
@@ -104,3 +105,98 @@ The ``pika/execution.hpp`` header provides functionality related to ``std::execu | |||
.. literalinclude:: ../examples/documentation/when_all_vector_documentation.cpp | |||
:language: c++ | |||
:start-at: #include | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API reference page is starting to get quite long. We could consider splitting it up to have one header per page. I'm still slightly in favour of keeping it on one page for Ctrl-F-ability and clicking on references not changing the page, but wouldn't object to splitting it up either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revisit this once page is longer. Consider splitting up the API page by topics/high-level categories rather than one page per header.
Adds documentation for classes that weren't yet documented and updates the existing documentation. Adds a section for
pika/cuda.hpp
to the API documentation.Early stages, far from complete, but may require some discussion before continuing.