You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The default layout for basic_ndarray is the typical row-major order. That is, the neighbor of an element in the next column more local than the neighbor of an element in the next row.
On the other hand, multidimensional CUDA thread indices seem to be in column-major order. That is, the neighbor of a CUDA thread with the next .x index is more local than the neighbor of a CUDA thread with the next .y index.
Multidimensional CUDA kernel launches are described with a leading .x dimension, then .y, then finally .z.
It seems like it would be more convenient for multi-dimensional arrays and multi-dimensional kernels to have the same default layout, because it wouldn't require the user to explicitly transpose an agent's index when accessing an array.
On the other hand, this is not the way CUDA kernel launches work, so it might be considered surprising instead of convenient.
Another way to state this is that the elements of threadIdx and blockIdx are in colexicographic order and the elements of N-dimensional array indices are in lexicographic order.
Anyway, if we wished to make this change, it would require transposing the elements of threadIndex and blockIndex when calling this_cuda_index() and this_block_index(). Likewise, it would require transposing the elements of the executor shape when creating the dim3s used in the CUDA triple chevrons.
The text was updated successfully, but these errors were encountered:
jaredhoberock
changed the title
Consider generating grid_executor's multidimensional indices in "row-major" order
Consider generating grid_executor's multidimensional indices in lexicographic order
Jun 23, 2017
The default layout for
basic_ndarray
is the typical row-major order. That is, the neighbor of an element in the next column more local than the neighbor of an element in the next row.On the other hand, multidimensional CUDA thread indices seem to be in column-major order. That is, the neighbor of a CUDA thread with the next
.x
index is more local than the neighbor of a CUDA thread with the next.y
index.Multidimensional CUDA kernel launches are described with a leading
.x
dimension, then.y
, then finally.z
.It seems like it would be more convenient for multi-dimensional arrays and multi-dimensional kernels to have the same default layout, because it wouldn't require the user to explicitly transpose an agent's index when accessing an array.
On the other hand, this is not the way CUDA kernel launches work, so it might be considered surprising instead of convenient.
Another way to state this is that the elements of
threadIdx
andblockIdx
are in colexicographic order and the elements of N-dimensional array indices are in lexicographic order.Anyway, if we wished to make this change, it would require transposing the elements of
threadIndex
andblockIndex
when callingthis_cuda_index()
andthis_block_index()
. Likewise, it would require transposing the elements of the executor shape when creating thedim3
s used in the CUDA triple chevrons.The text was updated successfully, but these errors were encountered: