Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unlimited param count (device inputs) #691

Open
devreal opened this issue Nov 1, 2024 · 0 comments
Open

Unlimited param count (device inputs) #691

devreal opened this issue Nov 1, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@devreal
Copy link
Contributor

devreal commented Nov 1, 2024

Description

The number of flows of a device task are capped by a compile-time constant PARSEC_MAX_PARAM_COUNT, which is used to create several member arrays in parsec_task_class_t, parsec_task_t, and parsec_gpu_task_t. The last one is relevant for TTG because it's the only place we actually use flows and it limits the number of inputs we can handle per task. PARSEC_MAX_PARAM_COUNT can be controlled through a CMake variable but we typically do not know the count up front. And even if we do, different tasks will have different numbers of inputs so setting PARSEC_MAX_PARAM_COUNT to a larger value applies to all task types even if not needed everywhere.

We have not hit this limitation yet but there is a good chance that we will. For example, some users of madness compute in 6 dimensions, which would need 64+ inputs. We will also look into batching of tasks in 3D, which would also quickly exceed the current default of 20 (batching just two levels again needs upwards of 64 inputs).

Describe the solution you'd like

Replace the fixed arrays with pointers to an array. In the case of parsec_gpu_task_t we would make the struct of arrays an array of structs (bundling flow_nb_elts, flow_nb_elts, flow_dc, and sources) and create one array of them. For task classes, extra dynamic allocation should not matter.

We should not use flexible array members because that will make it impossible to embed a parsec_task_t into another structure is generally not supported by C++. Instead, adding a pointer that can potentially point to extra memory at the end of the task structure (or even be NULL for zero flows, like regular tasks in TTG) would be preferable. This would also shrink the footprint of tasks in general, since most applications in PTG use small numbers of inputs.

I understand that bitmaps are used to encode what flows are used so that would have to change as well. We would have to introduce more flexible bitmaps.

Describe alternatives you've considered

The current state would work if in MRA we choose high compile-time defaults once we start batching but that is wasteful for most tasks in the system. We'd also still be limited to whatever integer type is used for the bitmap in PaRSEC.

Additional context

Add any other context, references, and related works about the feature request here.

@devreal devreal added the enhancement New feature or request label Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant