Various changes driven by ISO C++ proposal P0443.
bulk_sync_execute
has been eliminatedis_bulk_synchronous_executor
has been eliminatedbulk_async_execute
has been eliminatedis_bulk_asynchronous_executor
has been eliminatedasync_execute
has been eliminatedis_asynchronous_executor
has been eliminatedsync_execute
has been eliminatedis_synchronous_executor
has been eliminatedbulk_then_execute
has been eliminatedis_bulk_continuation_executor
has been eliminatedthen_execute
has been eliminatedis_continuation_executor
has been eliminatedis_bulk_executor
has been eliminatedis_simple_executor
has been eliminatedfuture_value
andfuture_value_t
have been renamedfuture_result
andfuture_result_t
, respectively.executor_execution_category
andexecutor_execution_category_t
have been replaced with thebulk_guarantee
executor propertyexecution_categories.hpp
and the functional therein has been eliminatedexecution_agent_traits<A>::execution_category
has been replaced withexecution_agent_traits<A>::execution_requirement
cuda::deferred_future
has been eliminated
require
and properties:bulk
single
then
always_blocking
bulk_guarantee
basic_span
cuda::vector
- Various executors now have equality operations.
pointer_adaptor
cuda::device_ptr
cuda::scoped_device
experimental::domain
overload
- #437 Nvcc emits warnings regarding standard library functions calling
__host__
-only functions
- #428 Warnings regarding ignored CUDA annotations have been eliminated
Agency 0.2.0 introduces new components for creating parallel C++ programs. A
suite of new containers allow easy management of collections of objects in
parallel programs. agency::array
and agency::vector
provide familiar C++
components in CUDA codes while components like shared
and shared_vector
allow groups of concurrent execution agents to cooperatively own an object.
New executors target CUDA cooperative kernels, OpenMP, loop unrolling, and
polymorphism. Finally, new speculative features allow programmers to experiment
with interfacing with native CUDA APIs, multidimensional arrays, and
range-based programming.
cuda::split_allocator
has been renamedcuda::heterogeneous_allocator
async
: Composes with an executor to create a single asynchronous function invocation.invoke
: Composes with an executor to create a single synchronous function invocation.
array
: Statically-sized object container based onstd::array
.vector
: Dynamically-sized object container based onstd::vector
.shared
: Container for a single object shared by a group of concurrent execution agents.shared_array
: Container for a statically-sized collection of objects shared by a group of concurrent execution agents.shared_vector
: Container for a dynamically-sized collection of objects shared by a group of concurrent execution agents.tuple
: A product type based onstd::tuple
.
concurrent_execution_policy_2d
: Induces a two-dimensional group of concurrent execution agents.sequenced_execution_policy_2d
: Induces a two-dimensional group of sequenced execution agents.parallel_execution_policy_2d
: Induces a two-dimensional group of parallel execution agents.unsequenced_execution_policy_2d
: Induces a two-dimensional group of unsequenced execution agents.- OpenMP-specific execution policies
omp::parallel_execution_policy
: Induces a group of parallel execution agents using an OpenMP executor.omp::unsequenced_execution_policy
: Induces a group of unsequenced execution agents using an OpenMP executor.
cuda::concurrent_grid_executor
: Creates concurrent-concurrent execution agents using CUDA 9's cooperative grid launch.omp::parallel_for_executor
AKAomp::parallel_executor
: Creates parallel execution agents using OpenMP's parallel for loop.omp::simd_executor
AKAomp::unsequenced_executor
: Creates unsequenced execution agents using OpenMP's SIMD for loop.experimental::unrolling_executor
: Creates sequenced execution agents using an unrolled for loop.variant_executor
: Creates execution agents with polymorphic execution guarantees using an dynamic, underlying executor.
cuda::device
: Creates acuda::device_id
from a device enumerant.cuda::devices
: Creates a collection ofcuda::device_id
from a sequence of device enumerants.cuda::all_devices
: Creates a collection ofcuda::device_id
corresponding to all devices in the system.
experimental::ndarray
: Dynamically-sized multidimensional object container.experimental::ndarray_ref
: View of a multidimensional object container.
cuda::experimental::static_grid
: Induces a statically-sized group of parallel-concurrent execution agents using CUDA grid launch.cuda::experimental::static_con
: Indicues a statically-sized group of concurrent execution agents using CUDA grid launch.
cuda::experimental::make_async_future
: Creates acuda::async_future
from underlying CUDA resources.cuda::experimental::make_dependent_stream
: Creates acudaStream_t
from acuda::async_future
.- Fancy ranges
experimental::interval()
: Creates a range of integers specified by two end points.experimental::iota_view
: A range of increasing integers.experimental::transformed_view
: A transformed view of an underlying range.experimental::zip_with_view
: A zipped-and-then-transformed view of multiple underlying ranges.
- #428 nvcc 9.0 emits spurious warnings regarding ignored annotations
- #347 Various warnings at aggressive reporting levels have been eliminated
- #289
async_future::bulk_then()
needs to schedule theouter_arg
's destruction - #352 .rank() generates results in the wrong order
- Thanks to Steven Dalton and other Github users for submitting bug reports.
- Thanks to Steven Dalton, Michael Garland, Mike Bauer, Isaac Gelado, Saurav Muralidharan, and Cris Cecka for continued input into Agency's overall design and implementation.
Agency 0.1.0 introduces new control structures such as bulk_invoke()
for creating parallel tasks. A suite of new execution policies compose with these control structures to require different kinds of semantic guarantees from the created tasks. A new library of executors controls the mapping of tasks onto underlying execution resources such as CPUs, GPUs, and collections of multiple GPUs. In addition to these basic components, this release also introduces experimental support for a collection of utility types useful for creating Agency programs.
bulk_invoke
bulk_async
bulk_then
concurrent_execution_policy
sequenced_execution_policy
parallel_execution_policy
unsequenced_execution_policy
- CUDA-specific execution policies
cuda::concurrent_execution_policy
cuda::parallel_execution_policy
cuda::grid
concurrent_executor
executor_array
flattened_executor
parallel_executor
scoped_executor
sequenced_executor
unsequenced_executor
vector_executor
- CUDA-specific executors
cuda::block_executor
cuda::concurrent_executor
cuda::grid_executor
cuda::grid_executor_2d
cuda::multidevice_executor
cuda::parallel_executor
experimental::array
experimental::bounded_integer
experimental::optional
experimental::short_vector
experimental::span
- Fancy ranges based on the range-v3 library
experimental::chunk_view
experimental::counted_view
experimental::stride_view
experimental::zip_view
concurrent_ping_pong.cpp
concurrent_sum.cpp
fill.cpp
hello_async.cpp
hello_lambda.cpp
hello_then.cpp
hello_world.cpp
ping_pong_tournament.cpp
saxpy.cpp
version.cpp
- CUDA-specific example programs
- #255 Agency is not known to work with any version of the Microsoft Compiler
- #256 Agency is not known to work with NVIDIA Compiler versions prior to 8.0
- #257 Agency is not known to work with NVIDIA GPU architectures prior to
sm_3x
- Thanks to Michael Garland for significant input into Agency's overall design.
- Thanks to the NVIDIA compiler team, especially Jaydeep Marathe, for enhancements to
nvcc
's C++ support. - Thanks to Steven Dalton, Mark Harris, and Evghenni Gaburov for testing this release during development.
- Thanks to Duane Merrill and Sean Baxter for design feedback.
- Thanks to Olivier Giroux for contributing an implementation of synchronic.
This version of Agency was not released.