Limit buffers sizes to leave some memory for the platform #1172

ouakheli · 2021-02-23T17:39:30Z

Some conformance tests use directly the size returned by the runtime
for max memory size to allocate buffers.
This doesn't leave enough memory for the system to run the tests.

jlewis-austin

This looks like a pretty fundamental change to code that other implementations are passing without a problem.
I'd suggest starting with a detailed analysis of an individual test to show why the current value is unreasonable. Most tests do reduce the allocation size, so it seems that the question should be more about whether those reductions are correct or not, versus arbitrarily cutting everything in half.

test_conformance/api/test_api_min_max.cpp

ouakheli · 2021-05-24T11:36:12Z

@jlewis-austin It depends really of what the implementation is returning as max size. If the implementation returns the whole available memory in the system, creating a buffer with that size will succeed but the system won't have enough memory left.

jlewis-austin

There's some good fixes in here, but it also introduces functional changes that could impact existing implementations. I think we need to address the fundamental question first--should an implementation report a buffer size that can't be created due to platform memory usage? I tend to think that the allocation tests should be able to pass as-is without adjusting the reported numbers, but welcome other opinions.

jlewis-austin · 2021-06-03T16:53:49Z

test_conformance/api/test_api_min_max.cpp

-
+    currentSize = maxAllocSize;
+    while (currentSize >= maxAllocSize / PASSING_FRACTION)
+    {
        log_info("Trying to create a buffer of size of %lld bytes (%gMB).\n", maxAllocSize, (double)maxAllocSize/(1024.0*1024.0));
        memHdl = clCreateBuffer( context, CL_MEM_READ_ONLY, (size_t)maxAllocSize, NULL, &error );
        if (error == CL_MEM_OBJECT_ALLOCATION_FAILURE || error == CL_OUT_OF_RESOURCES || error == CL_OUT_OF_HOST_MEMORY) {


The original version doesn't make much sense, but I'm guessing the intent was to limit attempts to quit trying at maxAllocSize/4. In that case, shouldn't currentSize be used in clCreateBuffer (and decremented) instead of maxAllocSize?

Change to currentSize

jlewis-austin · 2021-06-03T17:30:21Z

test_conformance/api/test_api_min_max.cpp

-    while (maxAllocSize > (maxAllocSize/4)) {
-
+    currentSize = maxAllocSize;
+    while (currentSize >= maxAllocSize / PASSING_FRACTION)


It took me a while to figure out the name "PASSING_FRACTION", but I'm having trouble thinking of anything better. Maybe something like MAX_REDUCTION_FACTOR?

Change to MAX_REDUCTION_FACTOR

jlewis-austin · 2021-06-03T17:33:39Z

test_conformance/api/test_api_min_max.cpp

+    if ((0 == gIsEmbedded && maxAllocSize < 128 * 1024 * 1024)
+        || maxAllocSize < 1 * 1024 * 1024)


These checks can introduce new failures since maxAllocSize has already been reduced. There should be a clear distinction between reported max alloc size and any modifications we do to it. I still have the same objection about reducing allocation size in two different places. Applying this factor up front is reducing coverage for implementations that currently pass the test with their reported size--after this change they would be testing only half the size they're capable of.

This value is 1M for embedded and 128M otherwise.
I changed the check to use requiredAllocSize for me consistency but I don't see an application reporting 2M or 256M which with the reduction would make it fail ?

jlewis-austin · 2021-06-03T17:35:54Z

test_common/harness/deviceInfo.cpp

+    if (clGetDeviceInfo(device, info, sizeof(max_size), &max_size, NULL)
+        != CL_SUCCESS)
+    {
+        throw std::runtime_error("clGetDeviceInfo failed\n");


Is this caught somewhere? It looks like a change in functionality, since the existing test_error_abort macro just returns an error from the caller and allows further testing to continue.

I tried to harmonize this function with get_max_param_size function
Does't really make sense to continue the testing if we clGetDeviceInfo returns an error ?

ouakheli · 2021-06-10T14:50:10Z

There's some good fixes in here, but it also introduces functional changes that could impact existing implementations. I think we need to address the fundamental question first--should an implementation report a buffer size that can't be created due to platform memory usage? I tend to think that the allocation tests should be able to pass as-is without adjusting the reported numbers, but welcome other opinions.

Agreed we need more opinions on that before proceeding with the other comments as this is fundamental for this change.
I don't think any implementation should return something else than the maximum memory available for the buffer creation.
Only the application is aware of the memory needed for the system and the operations that are going to be performed in the system with the buffer.
The implementation can't know what is needed for the platform and this can be different from a platform to another, so if it doesn't report the whole memory it would apply an arbitrary factor and would work for most of the platforms ?

What about passing a parameter to the test ? as we do in test_allocation reduction% ?

jlewis-austin · 2021-06-11T22:15:26Z

I should clarify that most of the reductions look fine, for the tests that aren't specifically testing max alloc size functionality. The main one I'm concerned about is test_api_min_max, where there's no "app" to speak of beyond spinning up the platform and doing a single allocation.

That being said, I think you're right about not estimating platform usage, and I got two trains of thought mixed up in the problem statement. The real question should have been whether the CTS lower limits on retries (ie, your PASSING_FRACTION) serve as a "reasonable" minimum value that should succeed, or if the tests should remove all limitations and simply keep re-trying down to size 1. On the max allocation tests, I think it makes sense in either case to always attempt the max size in order to verify that it either fails gracefully or succeeds.

bashbaug · 2024-06-18T16:49:02Z

Discussed in the June 18th teleconference. This is an older PR that has been rebased and updated and is now ready for review.

bashbaug

Changes look good to me technically. I do wonder whether we're losing too much coverage reducing allocation size by 50% in many cases, but we can try this and see.

bashbaug · 2024-07-09T02:39:05Z

test_conformance/allocations/main.cpp

@@ -24,8 +24,10 @@

 typedef long long unsigned llu;

+#define REDUCTION_PERCENTAGE_DEFAULT 50


I'm not necessarily opposed to this, but it does change the default behavior of the test pretty significantly, since a "reduction percentage" less than 100% also affects the number of work-items executed.

Should we just change the number of work-items executed unconditionally? I'd feel more comfortable if this were just changing the max allocation percentage.

test_conformance/api/test_api_min_max.cpp

Some conformance tests use directly the size returned by the runtime for max memory size to allocate buffers. This doesn't leave enough memory for the system to run the tests.

bashbaug · 2024-08-06T16:32:05Z

Merging as discussed in the August 6th teleconference.

ouakheli force-pushed the max_size_change branch from 606b56c to c82bc20 Compare February 23, 2021 19:40

jlewis-austin reviewed Mar 1, 2021

View reviewed changes

test_conformance/api/test_api_min_max.cpp Show resolved Hide resolved

ouakheli force-pushed the max_size_change branch 8 times, most recently from 743a633 to 4859c47 Compare May 24, 2021 11:07

jlewis-austin reviewed Jun 3, 2021

View reviewed changes

ouakheli force-pushed the max_size_change branch 6 times, most recently from 2e029ab to c55b922 Compare June 24, 2021 13:32

jlewis-austin mentioned this pull request Jun 29, 2021

Decide behavior of MAX_MEM_ALLOC_SIZE and GLOBAL_MEM_SIZE allocation tests #1281

Open

ahesham-arm force-pushed the max_size_change branch from c55b922 to 6579c70 Compare October 31, 2023 16:24

kpet mentioned this pull request Nov 1, 2023

Limit individual allocation size using the global memory size #1835

Merged

ahesham-arm force-pushed the max_size_change branch 7 times, most recently from 0e12851 to a53a099 Compare June 17, 2024 15:49

ahesham-arm force-pushed the max_size_change branch 5 times, most recently from d4b02e8 to a6bf03c Compare June 17, 2024 16:46

kpet added the focused review label Jun 18, 2024

bashbaug reviewed Jul 9, 2024

View reviewed changes

lakshmih reviewed Jul 10, 2024

View reviewed changes

test_conformance/api/test_api_min_max.cpp Outdated Show resolved Hide resolved

Limit buffers sizes to leave some memory for the platform

a98506b

Some conformance tests use directly the size returned by the runtime for max memory size to allocate buffers. This doesn't leave enough memory for the system to run the tests.

ahesham-arm force-pushed the max_size_change branch from a6bf03c to a98506b Compare July 22, 2024 07:48

lakshmih approved these changes Jul 29, 2024

View reviewed changes

bashbaug merged commit f473546 into KhronosGroup:main Aug 6, 2024
7 checks passed

ahesham-arm deleted the max_size_change branch August 6, 2024 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit buffers sizes to leave some memory for the platform #1172

Limit buffers sizes to leave some memory for the platform #1172

ouakheli commented Feb 23, 2021

jlewis-austin left a comment

ouakheli commented May 24, 2021

jlewis-austin left a comment

jlewis-austin Jun 3, 2021

ouakheli Jun 23, 2021

jlewis-austin Jun 3, 2021

ouakheli Jun 23, 2021

jlewis-austin Jun 3, 2021

ouakheli Jun 23, 2021

jlewis-austin Jun 3, 2021

ouakheli Jun 23, 2021

ouakheli commented Jun 10, 2021 •

edited

Loading

jlewis-austin commented Jun 11, 2021

bashbaug commented Jun 18, 2024

bashbaug left a comment

bashbaug Jul 9, 2024

bashbaug commented Aug 6, 2024

		if ((0 == gIsEmbedded && maxAllocSize < 128 * 1024 * 1024)
		\|\| maxAllocSize < 1 * 1024 * 1024)

		@@ -24,8 +24,10 @@

		typedef long long unsigned llu;

		#define REDUCTION_PERCENTAGE_DEFAULT 50

Limit buffers sizes to leave some memory for the platform #1172

Limit buffers sizes to leave some memory for the platform #1172

Conversation

ouakheli commented Feb 23, 2021

jlewis-austin left a comment

Choose a reason for hiding this comment

ouakheli commented May 24, 2021

jlewis-austin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ouakheli commented Jun 10, 2021 • edited Loading

jlewis-austin commented Jun 11, 2021

bashbaug commented Jun 18, 2024

bashbaug left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bashbaug commented Aug 6, 2024

ouakheli commented Jun 10, 2021 •

edited

Loading