Skip to content

Commit

Permalink
Return device total global memory for MaxAllocSize
Browse files Browse the repository at this point in the history
The SYCL specifiction inherits an OpenCL query for the maximum single
allocaton that can be made, and requires that at least one-quarter of
the device's reported global memory size can be allocated.

CUDA doesn't really have such a limitation as far as I can tell, and
will happily allocate anything up to the total size of global memory
in one go. This means returning that size is a reasonable guess for
MaxAllocSize.
  • Loading branch information
DuncanMcBain committed Dec 12, 2023
1 parent 20b9a83 commit bcb682e
Showing 1 changed file with 5 additions and 11 deletions.
16 changes: 5 additions & 11 deletions source/adapters/cuda/device.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -64,17 +64,11 @@ struct ur_device_handle_t_ {
}

// Max size of memory object allocation in bytes.
// The minimum value is max(min(1024 × 1024 ×
// 1024, 1/4th of CL_DEVICE_GLOBAL_MEM_SIZE),
// 32 × 1024 × 1024) for devices that are not of type
// CL_DEVICE_TYPE_CUSTOM.
size_t Global = 0;
UR_CHECK_ERROR(cuDeviceTotalMem(&Global, cuDevice));

auto QuarterGlobal = static_cast<uint32_t>(Global / 4u);

MaxAllocSize = std::max(std::min(1024u * 1024u * 1024u, QuarterGlobal),
32u * 1024u * 1024u);
// The minimum value is max (1/4th of info::device::global_mem_size,
// 128*1024*1024) if this SYCL device is not device_type::custom.
// CUDA doesn't really have this concept, and could allow almost 100% of
// global memory in one allocation, but is dependent on device usage.
UR_CHECK_ERROR(cuDeviceTotalMem(&MaxAllocSize, cuDevice));
}

~ur_device_handle_t_() { cuDevicePrimaryCtxRelease(CuDevice); }
Expand Down

0 comments on commit bcb682e

Please sign in to comment.