Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

number_format no longer round properly big float #14332

Open
kylekatarnls opened this issue May 26, 2024 · 22 comments
Open

number_format no longer round properly big float #14332

kylekatarnls opened this issue May 26, 2024 · 22 comments

Comments

@kylekatarnls
Copy link
Contributor

kylekatarnls commented May 26, 2024

Description

The following code:

<?php

echo number_format(1599828571.23561241, 6, '.', '');

Resulted in this output:

1599828571.235613

But I expected this output instead:

1599828571.235612

This is the result with PHP < 8.4 and it sounds good to me either we consider round() or floor() as the best operation to use.

Context: precision matters in my case, as I aim to support float timestamp (possibly more precise than microseconds) then to round them as close as possible and in a consistent manner.

PHP Version

8.4 (master 2024-05-26)

Operating System

Both Ubuntu and Windows

@devnexen
Copy link
Member

Seems the affecting commit is this one. cc @SakiTakamachi

@SakiTakamachi
Copy link
Member

SakiTakamachi commented May 26, 2024

This is not a bug. This change in behavior occurs because the precision that can be rounded has increased by one digit.

It may seem odd to round 124 to 13, but 1599828571.23561241 and 1599828571.23561251 are the same value, so they are interpreted as 125.
https://3v4l.org/DJV35

Floating point numbers are not very accurate.

(edit)

In Carbon this may not be useful, but if you can handle the value as a string from the start, you can use bcround from BCMath as of 8.4.

@kylekatarnls
Copy link
Contributor Author

kylekatarnls commented May 27, 2024

Thank you. I see, we can deal with that.

For the record, the new method DateTime::createFromTimestamp accepts float but not string which leads to loosing the precision:

var_dump(DateTime::createFromTimestamp('1599828571.235612')); // => ...235611
var_dump(DateTime::createFromTimestamp(1599828571.235612)); // => ...235611

Or to an error when using declare(strict_types=1);

In Carbon we already have createFromTimestamp and it supports string without casting it to float, on the opposite, we use number_format to cast float to string then we split second and microseconds from the string so to keep maximum precision. Result then is equivalent to createFromFormat('U.u', ...)

If float cannot fully handle precision of U.u format, then maybe createFromTimestamp should also support string.

At the moment we use round() in our createFromTimestamp implementation so that so if we take a 6-digits decimal part:

echo number_format(1599828571.235612, 8, '.', '');

Then we get: 1599828571.23561192 and 23561192 got rounded to 235612 so we actually get more chance to get closer to the initial value. (If rounding decimal part hit 1'000'000, then we just increment the second.)

For instance if we exchange this number via JSON

JSON.stringify({created_at: 1599828571.235612})

JSON can properly represent this number with 6-digit, but we get in PHP a double(1599828571.23561192) and so choosing to use floor() to read it as a timestamp will always (except 1 lucky time over 1000) lead to be off by 1 microsecond.

I'm now wondering if I should align createFromTimestamp with PHP 8.4 behavior or if the exact behavior of this new method can still be challenged.

@SakiTakamachi
Copy link
Member

Due to the mechanism of floating point numbers, once a certain precision is exceeded, it becomes impossible to maintain the precision. In PHP, the maximum precision that can be maintained is defined by the constant PHP_FLOAT_DIG, which is 15 in general environments.

In this issue, we need up to 16 digits of precision, so float is unstable.

Luckily, this is an unreleased feature and not from an RFC, so there may be room to add a string to the argument.

@mvorisek
Copy link
Contributor

IMO this is a bug - repro https://3v4l.org/V47rI

The var_dump always prints the "full IEEE 754 64b precision" and it prints xxx.xxx6124 thus the rounded result should be really xxx.xxx612.

@SakiTakamachi
Copy link
Member

No, this is not a bug. These behaviors are clearly defined in the RFC.

https://wiki.php.net/rfc/change_the_edge_case_of_round

@kylekatarnls
Copy link
Contributor Author

I'm not sure to fully understand neither, extending on the @mvorisek example and comparing it to number_format if the precision is available I don't get how does it comes that it takes the upper rounding on https://3v4l.org/seUZ6/rfc#vgit.master

I just want to be sure my understanding is complete so I adapt accordingly the code in my library.

Similar question applied to createFromTimestamp() https://3v4l.org/9Fs3L/rfc#vgit.master here it sounds like we have the number precision up to the last digit after comma (2) when we var_dump but createFromTimestamp() "looses" it and floor down to 1.

I would also challenge that it's the expected behavior.

@kylekatarnls
Copy link
Contributor Author

kylekatarnls commented May 28, 2024

BTW thank you for the link to the RFC, I can see this:

As per the voting RFC a yes/no vote with a 2/3 majority is needed for this proposal to be accepted. Voting started on 2023-11-24 and will end on 2023-12-08 00:00 GMT.

And:

Status: Declined

So I'm also wondering how does it work and if it means it's definitely certain it will happen on PHP 8.4 or not.

@ranvis
Copy link
Contributor

ranvis commented May 29, 2024

@SakiTakamachi

It may seem odd to round 124 to 13, but 1599828571.23561241 and 1599828571.23561251 are the same value, so they are interpreted as 125.

1599828571.235612_41 and 1599828571.235612_51, and also 1599828571.235612_28, are represented as the same double value, 0x1.7d6dc96cf1446p+30, which is exactly 1599828571.235612_39_2425537109375.

Regardless of the literal representations in PHP code, that double value is the value that actually exists.
In other words, they result in the same value, but the resulting value is less than ...12_5.

The commit being discussed references PR #12222:

In the existing implementation, for example, the following code does not perform rounding:

<?php
var_dump(round(0.12345678901234565, 16, PHP_ROUND_HALF_UP));
// float(0.12345678901234565)

// I want it to be like this
// float(0.1234567890123457)

There seems to be misinterpretation here.
0.1234567890123456_5 is 0x1.f9add3746f65dp-4 =
0.1234567890123456_4_961431061647090245969593524932861328125.

This is also less than ...5.

@SakiTakamachi
Copy link
Member

SakiTakamachi commented May 30, 2024

@ranvis
Of course I understand that. That's correct for FP. However, in current PHP, the RFC states that it should behave as if it were a decimal number.

Please see:
https://wiki.php.net/rfc/change_the_edge_case_of_round
(Please note that this RFC has been rejected.)

@SakiTakamachi
Copy link
Member

@kylekatarnls
The behavior of createFromTimestamp has been fixed. createFromTimestamp does proper rounding.
#14375

@ranvis
Copy link
Contributor

ranvis commented May 31, 2024

@SakiTakamachi
Could you provide more details, please?
I'm not entirely clear on the reasoning behind this change.
The terminating decimal value 1599828571.235612_39... is the midpoint of the range that converts to the value.
Could you explain why the value ...235612_51 was chosen instead,
and which statements of the RFC you are referring to regarding this issue?

@SakiTakamachi
Copy link
Member

@ranvis

There is a link at the beginning of my RFC, the RFC that defines the current behavior is:
https://wiki.php.net/rfc/rounding

@ranvis
Copy link
Contributor

ranvis commented May 31, 2024

Um... That RFC describes how it works before the change.
Still I don't get why that makes the value regarded as ..._51.

I'm now under the impression that 1e15 (0x1.c6bf52634p+49) is already around the limit for the mentioned decimal (3.32bit/digit) rounding to work reliably considering double has 52+1 bit mantissa.

@SakiTakamachi
Copy link
Member

@TimWolla

I would appreciate your opinion on this issue :)

@cmb69
Copy link
Member

cmb69 commented Jul 11, 2024

Hmm, what is the actual status of this ticket? It's labeled with "Status: Needs Triage" and "Status: Verified".

@SakiTakamachi
Copy link
Member

I don't know what the appropriate status would be in this case...

I have explained everything that can be explained.

But it seems I'm not communicating that very well.

@damianwadley
Copy link
Member

Sounds like "needing triage" is literally correct...

@cmb69
Copy link
Member

cmb69 commented Jul 12, 2024

I've written to the internals mailing list in the hope to get more feedback.

@cmb69
Copy link
Member

cmb69 commented Jul 13, 2024

For me, this reply clarified things: compare https://3v4l.org/YbUMA and https://3v4l.org/YbUMA/rfc. This is clearly an improvement (aka. fixes very unexpected behavior), so besides clarifying UPGRADING, I think we can stick with the new behavior (and mark this ticket as WONTFIX).

@kylekatarnls
Copy link
Contributor Author

kylekatarnls commented Jul 13, 2024

From where I stand it's hard to understand why fixing round broke the number_format case above and why we need to choose one or the other. Shouldn't it just be correct in both case?

@cmb69
Copy link
Member

cmb69 commented Jul 13, 2024

From where I stand it's hard to understand why fixing round broke the number_format case above and why we need to choose one or the other. Shouldn't it just be correct in both case?

To clarify: what has been changed is the internal function _php_math_round(), and this is used from PHP's round() and number_format() (and some others).

Now inside _php_math_round() was a guard expression:

php-src/ext/standard/math.c

Lines 168 to 171 in 10a94f8

/* This value is beyond our precision, so rounding it is pointless */
if (fabs(tmp_value) >= 1e15) {
return value;
}

The comment is apparently not correct, otherwise https://3v4l.org/YbUMA wouldn't happen. So the 1e15 has been replaced by 1e16. Is this the best (or proper) solution? Frankly, I don't know (I'm not an expert on IEEE 754), but having round($x, 0) returning a number with decimals is obviously wrong, and in my opinion worse than the reported behavior. I'm still not happy about the BC break in a minor version, but apparently that cannot be avoided, and after all, IEEE 754 double arithmetic cannot be really exact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants