Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Negative Tests for clDevice Functions #1148

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

chemis01
Copy link
Contributor

@chemis01 chemis01 commented Feb 5, 2021

Signed-off-by: Chetankumar Mistry chetan.mistry@arm.com

test_failure_error_ret(
err, CL_INVALID_PLATFORM,
"clGetDeviceIDs should return CL_INVALID_PLATFORM when: \"platform is "
"not a valid platform\" using a nullptr",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing nullptr for the platform is allowed and the behaviour is implementation-defined:

platform refers to the platform ID returned by clGetPlatformIDs or can be NULL. If platform is
NULL, the behavior is implementation-defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing nullptr for the platform is allowed and the behaviour is implementation-defined:

platform refers to the platform ID returned by clGetPlatformIDs or can be NULL. If platform is
NULL, the behavior is implementation-defined.

Then we have merged incorrect test case here ? Related with question #1222

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that code is incorrect. I've replied on the issue and asked @chemis01 to prepare a fix.

{
size_t number_of_properties = 0;
int err = clGetDeviceInfo(deviceID, CL_DEVICE_PARTITION_PROPERTIES, 0,
nullptr, &number_of_properties);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the total size of the data that's returned, you need to divide by sizeof(cl_device_partition_property) to get the number of properties.

std::vector<cl_device_partition_property> supported_properties(
number_of_properties);
err = clGetDeviceInfo(deviceID, CL_DEVICE_PARTITION_PROPERTIES,
supported_properties.size(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supported_properties.size() gives the number of elements, not the total size of the data. You'll need to multiply by sizeof(cl_device_partition_property)

if (supported_properties == SupportedPartitionSchemes::None)
{
printf("Device does not support creating subdevices... Skipping\n");
return TEST_SKIP;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return TEST_SKIP;
return TEST_SKIPPED_ITSELF;

TEST_FAIL);

err =
clGetDeviceInfo(reinterpret_cast<cl_device_id>(context), CL_DEVICE_TYPE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd been debating these types of tests internally, so I just want to confirm that this level of checking is expected. I thought there had been some suggestion in the past of simple null checks being adequate wrt determining object validity?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec issue is related, I think: KhronosGroup/OpenCL-Docs#482

Device IDs are slightly different since you "get" them rather than "create" them and they aren't associated with a context, but there are a lot of similarities to the objects discussed on the issue as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's what I was thinking of. I guess our own discussion has broadened to the more generic question of whether error-checking should be required to distinguish between any two resource objects (specifically those implementing the icd cl layout), so it's a question for negative testing on almost every API.

test_error(err, "clGetPlatformInfo");
}

if (timer_resolution != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work correctly for 2.0 devices?

Copy link
Contributor Author

@chemis01 chemis01 Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I missed that

test_error(err, "clGetPlatformInfo");
}

if (timer_resolution != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, it seems 2.0 devices will fall in here

@chemis01 chemis01 force-pushed the negative_device_tests branch 4 times, most recently from 2a77710 to 652455a Compare March 25, 2021 09:20
Copy link
Contributor

@bashbaug bashbaug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, but can you please check the device partitioning cases?

test_conformance/api/negative_device.cpp Outdated Show resolved Hide resolved
test_conformance/api/negative_device.cpp Outdated Show resolved Hide resolved
test_conformance/api/negative_device.cpp Outdated Show resolved Hide resolved
test_conformance/api/negative_device.cpp Show resolved Hide resolved
test_conformance/api/negative_device.cpp Outdated Show resolved Hide resolved
@jlewis-austin
Copy link
Contributor

@chemis01 do you mind if I split off the infrastructure required for the #1318 fix into a separate PR (with attribution of course)? Or you can do it if you'd rather. That would also simplify this one a bit and maybe help get it pushed through faster.

@chemis01
Copy link
Contributor Author

@chemis01 do you mind if I split off the infrastructure required for the #1318 fix into a separate PR (with attribution of course)? Or you can do it if you'd rather. That would also simplify this one a bit and maybe help get it pushed through faster.

@jlewis-austin I am completely fine with you splitting it off into a separate PR.

@jlewis-austin
Copy link
Contributor

@chemis01 do you mind if I split off the infrastructure required for the #1318 fix into a separate PR (with attribution of course)? Or you can do it if you'd rather. That would also simplify this one a bit and maybe help get it pushed through faster.

@jlewis-austin I am completely fine with you splitting it off into a separate PR.

We ended up going with a simpler fix in #1374 to expedite it and keep it separate from the refactor work.

Signed-off-by: Chetankumar Mistry <chetan.mistry@arm.com>
This commit addresses upstream comments by:
    1) Remove the test for CL_INVALID_PLATFORM when
       using a nullptr for clGetDeviceIDs
    2) Ensure that the number of returned
       CL_DEVICE_PARTITION_PROPERTIES is handled correctly
    3) Replace TEST_SKIP with TEST_SKIPPED_ITSELF

Signed-off-by: Chetankumar Mistry <chetan.mistry@arm.com>
Add functionality in the harness to allow negative
tests to optionally test Object Validity by passing in
a valid object which is not the correct object.
This option can be selected by passing
--invalid-object-scenarios as a command-line argument.

Additionally this change updates the negative platform
tests so that they use this feature

Signed-off-by: Chetankumar Mistry <chetan.mistry@arm.com>
This change updates the negative device tests so that
they can use the optional argument to test for object
validity. E.g. optionally use a nullptr or a valid object
which is not a device when a devices is needed

Signed-off-by: Chetankumar Mistry <chetan.mistry@arm.com>
When testing optional "invalid_object" scenarios, we want
to omit the case when a nullptr is used when the function
asks for a cl_platform_id. This is because this scenario is
implementation defined and so a nullptr is valid.

Signed-off-by: Chetankumar Mistry <chetan.mistry@arm.com>
This commit addresses upstream comments by ensuring that
CL_DEVICE_PARTITION_BY_COUNTS_LIST_END is present when
partioning by Counts

Signed-off-by: Chetankumar Mistry <chetan.mistry@arm.com>
This commit addresses upstream comments where the number of
out devices and the number of subdevices targeted by the
properties variable were creating issues with testing.

Signed-off-by: Chetankumar Mistry <chetan.mistry@arm.com>
@ahesham-arm
Copy link
Contributor

@kpet Rebased and CI checks are now passing.

Comment on lines +422 to +423
err = clCreateSubDevices(deviceID, properties, max_for_partition + 1,
out_devices.data(), nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is still a problem here. On the Intel CPU OpenCL implementation, testing on a CPU with 24 cores, I am seeing this call being generated:

>>>> clCreateSubDevices: in_device = 12th Gen Intel(R) Core(TM) i9-12900K (CL_DEVICE_TYPE_CPU) (0x5e7b00958778), properties = [ CL_DEVICE_PARTITION_EQUALLY = 1 ], num_devices = 25
<<<< clCreateSubDevices -> CL_SUCCESS

This call is returning CL_SUCCESS, and I don't see any reason why it should not return CL_SUCCESS: it is only creating 24 sub-devices with this call and the 25th is un-written, but this should be fine. This certainly is not testing the case where "in_device could not be further partitioned".

Perhaps the properties should be different to test this case? For example, I believe trying to partition equally into a device with 24 compute units will generate the expected error, since in this case the input device is not being further partitioned:

>>>> clCreateSubDevices: in_device = 12th Gen Intel(R) Core(TM) i9-12900K (CL_DEVICE_TYPE_CPU) (0x5f50d1d32778), properties = [ CL_DEVICE_PARTITION_EQUALLY = 24 ], num_devices = 1
ERROR! clCreateSubDevices returned CL_DEVICE_PARTITION_FAILED (-18)
<<<< clCreateSubDevices -> CL_DEVICE_PARTITION_FAILED

Comment on lines +448 to +449
test_failure_error_ret(
err, CL_INVALID_DEVICE_PARTITION_COUNT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: our CPU implementation is returning CL_DEVICE_PARTITION_FAILED in this case (and the case below), not CL_INVALID_DEVICE_PARTITION_COUNT. I suppose CL_INVALID_DEVICE_PARTITION_COUNT is more descriptive, but CL_DEVICE_PARTITION_FAILED seems like it is appropriate too. Should we accept both error codes, or do we definitely want this case to return CL_INVALID_DEVICE_PARTITION_COUNT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants