You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The number of flows of a device task are capped by a compile-time constant PARSEC_MAX_PARAM_COUNT, which is used to create several member arrays in parsec_task_class_t, parsec_task_t, and parsec_gpu_task_t. The last one is relevant for TTG because it's the only place we actually use flows and it limits the number of inputs we can handle per task. PARSEC_MAX_PARAM_COUNT can be controlled through a CMake variable but we typically do not know the count up front. And even if we do, different tasks will have different numbers of inputs so setting PARSEC_MAX_PARAM_COUNT to a larger value applies to all task types even if not needed everywhere.
We have not hit this limitation yet but there is a good chance that we will. For example, some users of madness compute in 6 dimensions, which would need 64+ inputs. We will also look into batching of tasks in 3D, which would also quickly exceed the current default of 20 (batching just two levels again needs upwards of 64 inputs).
Describe the solution you'd like
Replace the fixed arrays with pointers to an array. In the case of parsec_gpu_task_t we would make the struct of arrays an array of structs (bundling flow_nb_elts, flow_nb_elts, flow_dc, and sources) and create one array of them. For task classes, extra dynamic allocation should not matter.
We should not use flexible array members because that will make it impossible to embed a parsec_task_t into another structure is generally not supported by C++. Instead, adding a pointer that can potentially point to extra memory at the end of the task structure (or even be NULL for zero flows, like regular tasks in TTG) would be preferable. This would also shrink the footprint of tasks in general, since most applications in PTG use small numbers of inputs.
I understand that bitmaps are used to encode what flows are used so that would have to change as well. We would have to introduce more flexible bitmaps.
Describe alternatives you've considered
The current state would work if in MRA we choose high compile-time defaults once we start batching but that is wasteful for most tasks in the system. We'd also still be limited to whatever integer type is used for the bitmap in PaRSEC.
Additional context
Add any other context, references, and related works about the feature request here.
The text was updated successfully, but these errors were encountered:
Description
The number of flows of a device task are capped by a compile-time constant
PARSEC_MAX_PARAM_COUNT
, which is used to create several member arrays inparsec_task_class_t
,parsec_task_t
, andparsec_gpu_task_t
. The last one is relevant for TTG because it's the only place we actually use flows and it limits the number of inputs we can handle per task.PARSEC_MAX_PARAM_COUNT
can be controlled through a CMake variable but we typically do not know the count up front. And even if we do, different tasks will have different numbers of inputs so settingPARSEC_MAX_PARAM_COUNT
to a larger value applies to all task types even if not needed everywhere.We have not hit this limitation yet but there is a good chance that we will. For example, some users of madness compute in 6 dimensions, which would need 64+ inputs. We will also look into batching of tasks in 3D, which would also quickly exceed the current default of 20 (batching just two levels again needs upwards of 64 inputs).
Describe the solution you'd like
Replace the fixed arrays with pointers to an array. In the case of
parsec_gpu_task_t
we would make the struct of arrays an array of structs (bundlingflow_nb_elts
,flow_nb_elts
,flow_dc
, andsources
) and create one array of them. For task classes, extra dynamic allocation should not matter.We should not use flexible array members because that will make it impossible to embed a
parsec_task_t
into another structure is generally not supported by C++. Instead, adding a pointer that can potentially point to extra memory at the end of the task structure (or even beNULL
for zero flows, like regular tasks in TTG) would be preferable. This would also shrink the footprint of tasks in general, since most applications in PTG use small numbers of inputs.I understand that bitmaps are used to encode what flows are used so that would have to change as well. We would have to introduce more flexible bitmaps.
Describe alternatives you've considered
The current state would work if in MRA we choose high compile-time defaults once we start batching but that is wasteful for most tasks in the system. We'd also still be limited to whatever integer type is used for the bitmap in PaRSEC.
Additional context
Add any other context, references, and related works about the feature request here.
The text was updated successfully, but these errors were encountered: