Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added cl_half support for test_bruteforce #1681

Closed
wants to merge 28 commits into from

Conversation

shajder
Copy link
Contributor

@shajder shajder commented Mar 23, 2023

According to issue description (bruteforce section):

#142

Additional remark: This PR is based on @gwawiork work from PR #1433. Due to long approval process it was necessary to rebase and adapt to #1634 modernization. Due to reassignment to mobica-backlog it was more convenient to introduce new PR (which was discussed with Grzegorz).

@lakshmih
Copy link
Contributor

Testing / reviewing.

@lakshmih
Copy link
Contributor

We would like to double check results at our end, can we get some more time.

Copy link
Contributor

@lakshmih lakshmih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes.

Copy link
Member

@svenvh svenvh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now; thanks!

Copy link
Contributor

@bashbaug bashbaug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick run with these changes on our implementation with:

test_bruteforce -f -r -w -1

Aside from the known failure with fract I'm also seeing a failure with divide_cr:

 83:       divide_cr fp16     
ERROR: divide_cr: -0.500000 ulp error at {-0x1.394p-12 (0x8ce5), -0x1.cp+5 (0xd300)}
Expected: 0x1.68p-18  (half 0x005a) 
Actual: 0x1.64p-18 (half 0x0059) at index: 22220
ERROR: ThreadPool_Do: TestHalf failed
! (CL_DEVICE_NOT_FOUND from /home/bashbaug/git/OpenCL-CTS/test_conformance/math_brute_force/binary_operator_half.cpp:667)
divide_cr FAILED

I haven't investigated further but I wanted to raise this ASAP.

@bashbaug
Copy link
Contributor

bashbaug commented Dec 5, 2023

fma is failing also, if I take off -w:

 25:             fma fp16     ....
ERROR: fma: -0.500000 ulp error at {0x1.eacp+7, 0x1.3f4p+4, 0x1.c04p+14} ({0x5bab, 0x4cfd, 0x7701}): *0x1.068p+15 vs. 0x1.064p+15
fma FAILED

@bashbaug
Copy link
Contributor

bashbaug commented Dec 5, 2023

I'm also seeing a failure with divide_cr:

It looks like the tests are checking for a correctly rounded result in this case, but it's not clear this is required. The build option specifically says (emphasis mine):

"-cl-fp32-correctly-rounded-divide-sqrt
The -cl-fp32-correctly-rounded-divide-sqrt build option to clBuildProgram or clCompileProgram allows an application to specify that single precision floating-point divide (x/y and 1/x) and sqrt used in the program source are correctly rounded. If this build option is not specified, the minimum numerical accuracy of single precision floating-point divide and sqrt are as defined in the OpenCL C or OpenCL SPIR-V Environment specifications."

I think the safest thing to do would be to disable the divide_cr test for fp16 while we figure out what to do with it.

I'll look at fma next, but probably not before the teleconference.

@svenvh
Copy link
Member

svenvh commented Dec 5, 2023

It looks like the tests are checking for a correctly rounded result in this case, but it's not clear this is required. The build option specifically says (emphasis mine):

"-cl-fp32-correctly-rounded-divide-sqrt The -cl-fp32-correctly-rounded-divide-sqrt build option to clBuildProgram or clCompileProgram allows an application to specify that single precision floating-point divide (x/y and 1/x) and sqrt used in the program source are correctly rounded. If this build option is not specified, the minimum numerical accuracy of single precision floating-point divide and sqrt are as defined in the OpenCL C or OpenCL SPIR-V Environment specifications."

I think the safest thing to do would be to disable the divide_cr test for fp16 while we figure out what to do with it.

Correctly rounded results for fp16-divide and fp16-sqrt are not dictated by CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT indeed. Instead, "Table 11. ULP Values for Half Precision Floating-Point Arithmetic Operations" of the extension spec states that fp16-divide and fp16-sqrt need to be correctly rounded for Full Profile, and <= 1 ULP for Embedded Profile.

That made me realize that the additions to function_list.cpp only add the Full Profile ULP values. For quite a few functions (e.g. cos, exp, ...), the Embedded Profile ULP value is different, but this is not taken into account in this patch.

@lakshmih
Copy link
Contributor

lakshmih commented Dec 12, 2023

We are seeing failures in multiple tests with this patch and haven't fully resolved them to bugs in the test or in our implementation.

List:

Add -> Fails when one or more inputs are subnormal.
Fma
Fract
Multiply
Remquo
Subtract
Tanpi

float err2, err3;

correct2 = HTF(func.f_ff(0.0, s2[j]));
correct3 = HTF(func.f_ff(-0.0, s2[j]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The half_to_float is unnecessary here since func.f_ff() returns a double. Similar changes are needed elsewhere in the code as well.

correct2 = func.f_ff(0.0, s2[j]);
correct3 = func.f_ff(-0.0, s2[j]);

@bashbaug
Copy link
Contributor

We're still debugging a few unexpected failures on our side, also.

Can we merge these changes but with half testing disabled by default, able to be enabled via command line arguments? Then, when debug is complete and any issues are fixed, we can re-enable half testing by default?

@svenvh
Copy link
Member

svenvh commented Dec 12, 2023

Can we merge these changes but with half testing disabled by default, able to be enabled via command line arguments? Then, when debug is complete and any issues are fixed, we can re-enable half testing by default?

That sounds good to me; as this is a very large patch, it will be hard to get everything right the first time without the patch going stale.

@bashbaug
Copy link
Contributor

I just created an fp16-staging branch, as we discussed. If this PR is retargeted at this branch I will merge it, also as we discussed.

@shajder
Copy link
Contributor Author

shajder commented Dec 18, 2023

I just created an fp16-staging branch, as we discussed. If this PR is retargeted at this branch I will merge it, also as we discussed.

@bashbaug I've created new PR to be merged with fp16-staging branch

@Nuullll
Copy link
Contributor

Nuullll commented Dec 20, 2023

Got 2 fails with test_bruteforce -f -r -w -1 on our device. Investigating:

fract...
 57:           fract fp64     ................Wimp pass	{    0.00,     0.00} @ {0x0p+0, 0x0p+0}
 58:           fract fp16     
ERROR: fract: {0.125000, 0.000000} ulp error at -0x1p-14: *{0x1.ffcp-1, -0x1p+0} vs. {0x1p+0, -0x1p+0}
fract FAILED

nextafter...
117:       nextafter fp64     ................Wimp pass	    0.00 @ {0x0p+0, 0x0p+0}
118:       nextafter fp16     
ERROR: nextafter: -nan ulp error at {0x0p+0 (0x0000), 0x1.3ccp-2 (0x34f3)}
Expected: 0x1p-24  (half 0x0001) 
Actual: -nan (half 0xffff) at index: 54

ERROR: nextafter: -nan ulp error at {0x0p+0 (0x0000), 0x1.578p-6 (0x255e)}
Expected: 0x1p-24  (half 0x0001) 
Actual: -nan (half 0xffff) at index: 4670

@hvdijk
Copy link
Contributor

hvdijk commented Jan 17, 2024

I'm seeing

ERROR: fma: -0.500000 ulp error at {0x1.eacp+7, 0x1.3f4p+4, 0x1.c04p+14} ({0x5bab, 0x4cfd, 0x7701}): *0x1.068p+15 vs. 0x1.064p+15
fma FAILED

on our implementation. 0x1.eacp+7 * 0x1.3f4p+4 + 0x1.c04p+14 is exactly 0x1.065fffp+15. Rounding to half-precision directly should give 0x1.064p+15. What I suspect one version is doing is rounding to single-precision first to give 0x1.066p+15, and then rounding that to half-precision to give 0x1.068p+15, which is the wrong result. But it never actually says which of the two values is the value the CTS considers correct, which would tell us whether it is the CTS that is doing the wrong calculation, or our implementation. Is the * meant to indicate the correct version? "expected: <...>, actual: <...>" would be clearer, in my opinion. Edit: Have confirmed that the CTS is the one giving the incorrect result.

@bashbaug
Copy link
Contributor

Removing "focused review", since we are reviewing and merging changes into the staging branch now.

@hvdijk
Copy link
Contributor

hvdijk commented Jan 24, 2024

I'm seeing

ERROR: fma: -0.500000 ulp error at {0x1.eacp+7, 0x1.3f4p+4, 0x1.c04p+14} ({0x5bab, 0x4cfd, 0x7701}): *0x1.068p+15 vs. 0x1.064p+15
fma FAILED

on our implementation. 0x1.eacp+7 * 0x1.3f4p+4 + 0x1.c04p+14 is exactly 0x1.065fffp+15. Rounding to half-precision directly should give 0x1.064p+15. What I suspect one version is doing is rounding to single-precision first to give 0x1.066p+15, and then rounding that to half-precision to give 0x1.068p+15, which is the wrong result. But it never actually says which of the two values is the value the CTS considers correct, which would tell us whether it is the CTS that is doing the wrong calculation, or our implementation. Is the * meant to indicate the correct version? "expected: <...>, actual: <...>" would be clearer, in my opinion. Edit: Have confirmed that the CTS is the one giving the incorrect result.

I was asked to open a PR for this against fp16-staging for this; done as #1882.

@Nuullll
Copy link
Contributor

Nuullll commented Feb 4, 2024

Got 2 fails with test_bruteforce -f -r -w -1 on our device. Investigating:

fract...
 57:           fract fp64     ................Wimp pass	{    0.00,     0.00} @ {0x0p+0, 0x0p+0}
 58:           fract fp16     
ERROR: fract: {0.125000, 0.000000} ulp error at -0x1p-14: *{0x1.ffcp-1, -0x1p+0} vs. {0x1p+0, -0x1p+0}
fract FAILED

nextafter...
117:       nextafter fp64     ................Wimp pass	    0.00 @ {0x0p+0, 0x0p+0}
118:       nextafter fp16     
ERROR: nextafter: -nan ulp error at {0x0p+0 (0x0000), 0x1.3ccp-2 (0x34f3)}
Expected: 0x1p-24  (half 0x0001) 
Actual: -nan (half 0xffff) at index: 54

ERROR: nextafter: -nan ulp error at {0x0p+0 (0x0000), 0x1.578p-6 (0x255e)}
Expected: 0x1p-24  (half 0x0001) 
Actual: -nan (half 0xffff) at index: 4670

The fract and nextafter errors were caused by our implementation bugs. After fixing our bugs, test_bruteforce -f -r -w is passing on our device.

@cycheng
Copy link
Contributor

cycheng commented May 16, 2024

When is this PR expected to be merged?

@shajder shajder closed this Jun 19, 2024
@shajder shajder reopened this Jun 19, 2024
@shajder
Copy link
Contributor Author

shajder commented Jun 19, 2024

This PR was merged to main branch along with this

@shajder shajder closed this Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants