Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integral comparisons and common_type are problematic #658

Open
almania opened this issue Dec 14, 2024 · 14 comments
Open

Integral comparisons and common_type are problematic #658

almania opened this issue Dec 14, 2024 · 14 comments
Labels
design Design-related discussion enhancement New feature or request

Comments

@almania
Copy link

almania commented Dec 14, 2024

The current specification of <=> reads:

Effects: Equivalent to: using ct = std::common_type_t<quantity, quantity<R2, Rep2>>;
const ct ct_lhs(lhs);
const ct ct_rhs(rhs);
return ct_lhs.numerical_value_ref_in(ct::unit) @ ct_rhs.numerical_value_ref_in(ct::unit);

Those casts to std::common_type_t mean that for integral use, the user needs to be very aware of potential overflow on any comparison between units of different magnitudes.

This is problematic, as as the hw_example points out, the library is otherwise well-suited for providing an interface to values representing all kinds of ranges. For instance this fairly simple "16 bit signed ADC, 121V max" definition:

  inline constexpr struct ADC_lsb final
      : named_unit<"ADC_lsb", mag_power<2, -15> * mag<121> * V> {
  } ADC_lsb;
  quantity<ADC_lsb, int16_t> read_adc();

Introduces an implicit 121x multiply of the int16_t on the line below:

  bool is_above_5V() { return read_adc() > 5 * V; }
  using comparison_introduced_cast_type =
      std::common_type_t<decltype(1 * ADC_lsb), decltype(5 * V)>;
  static_assert(
      (1 * ADC_lsb).numerical_value_in(comparison_introduced_cast_type::unit) ==
      121); // scary!

Which means that a value as low as 1V on the ADC (which can read up to 121V) will trigger a signed integer overflow in that innocuous line, that checks if the ADC is above 5V.

And the problem only gets worse the more accurately you define the units, eg by dialling in the exact voltage divider you may end up with much larger magnitude adjustments required to get to the common_type with the base unit, depending on how 'irrational' they are:

  constexpr auto ADC_max_volts = mag_ratio<33, 10>; // 3.3V
  constexpr auto ADC_max_value = mag<0x8000>;       // 16 bit signed

  constexpr auto R1 = 360; // 360 kohm
  constexpr auto R2 = 10;  // 10 kohm
  constexpr auto V_max = ADC_max_volts * mag_ratio<R1 + R2, R2>;

  inline constexpr struct ADC_lsb final
      : named_unit<"ADC_lsb", V_max / ADC_max_value * V> {
  } ADC_lsb;
  using comparison_introduced_cast_type =
      std::common_type_t<decltype(1 * ADC_lsb), decltype(5 * V)>;
  static_assert(
      (1 * ADC_lsb).numerical_value_in(comparison_introduced_cast_type::unit) ==
      1221); // very scary!

As now, with more accurate scaling factors, our voltage comparison will overflow when the ADC is reading a numerical value of just +/-27, out of +/-32767, making 0.1V unsafe to compare against 5V (or 0V, for that matter). I believe this will be both dangerous, and rather surprising for users in its current state.


My thoughts:

  1. Implicit conversions of integral quantities involving scaling values likely should not be allowed, or if they are, at least not for integers below a certain "big enough" and well documented bit width (maybe long).
  2. std::common_type should likely similarly promote integer representations to the next bit-width if scaling is involved, at least up to that "big enough" bit width. If not this, it likely either shouldn't be defined, or its use of automatically generated scaling factors be very carefully considered, imo. This differs from usual C++, but for the much better. Yes, it does mean that adding two quantities of different units may produce a different rep vs two quantities of the same unit, but it's better to make the user aware of the risks of what's going on behind the scenes than silently failing imo.
  3. Even if a solution is adopted such that std::common_type and/or implicit conversions are prohibited altogether (rather than increasing rep widths), comparisons could still be made completely accurate and safe via checking if either quantity is outside the representable range of the other quantity, and returning the appropriate ordering from that. Imo, this solution should be adopted even for "big enough" types, as it's strictly more accurate, at a slight runtime cost. Where values are known at compile-time, such as the 5 * V above, the compiler ought have an easy time producing optimal code at least, and the utility of it a dream.

A question also: should conversions where std::numeric_limits<rep>::max() produces the same value as ::min() be prohibited altogether? eg anything mixing small integrals of mV and MV could be detected at compile time, even more usefully for user derived types. Could make for a nice sanity check that you're dealing with the units you think you are, maybe.

@mpusz
Copy link
Owner

mpusz commented Dec 14, 2024

Hi @almania, thanks for great feedback 👍🏻

I am not sure if the framework should change the representation types by itself. I would prefer a compile-time error forcing a user to change the representation type before comparison or addition.

Your ideas seem to be related to #303. @chiphogg implemented such things in Au already. Could you please check if this would be enough for your case, or do we need to think about a more complex solution?

@mpusz mpusz added enhancement New feature or request design Design-related discussion labels Dec 14, 2024
@almania
Copy link
Author

almania commented Dec 14, 2024

Thanks @mpusz, love the project obv. Somewhat choosing between au and mp-units currently, there's a lot here that appeals to me - found myself writing a lot of concepts and extensions to get au to my liking - I've been spoilt by concepts and compile-time strings/formatting for a bit too long.

I'm happy with compile-time errors, but preference would be to still allow comparisons - they can be made completely safe/accurate by checking limits first, and are such a nicety to have, albeit at some library code complexity.

But that perhaps plays in to another thought I just had: if ambiguous/risky operations were disabled by requires clauses, would they still be overridable by importing a namespace providing alternative implementations? I feel that may simply be the case (if not, may require moving the member methods in to ADL space), but would allow nice extensibility in:

using namespace mp_unit_extensions::fancy_integral_comparisons;
using namespace mp_unit_extensions::integral_promotion_on_arithmetic;

Would be the best of all worlds, really - users in these small embedded worlds can just import whatever operation modes they like, even those not supported directly by the library.

I do agree with the complexity of changing reps, particularly when people are combining fixed point or accumulator types with the library - would likely require a lot of customization points to be practical. Which would be impractical.

@mpusz
Copy link
Owner

mpusz commented Dec 14, 2024

I'm happy with compile-time errors, but preference would be to still allow comparisons

What I mean is that a user could type:

read_adc().in<long>() == 5 * V;

to make it compile.

if ambiguous/risky operations were disabled by requires clauses, would they still be overridable by importing a namespace providing alternative implementations? I feel that may simply be the case (if not, may require moving the member methods in to ADL space)

I am not sure how you would like to make it work. Unfortunately, there is no global state you could mutate to change the behavior of the engine.

Things like fancy_integral_comparisons and integral_promotion_on_arithmetic could be implemented by your own representation types or by the safer types we will hopefully get in the std at some point in the future. For example:

using SafeInt<int16_t> my_rep;
quantity<ADC_lsb, my_rep> q = my_rep{5} * V;

I do agree with the complexity of changing reps, particularly when people are combining fixed point or accumulator types with the library

I am not afraid of the complexity here. It is not that big of a problem. The problem is that a user's type would be silently promoted to a bigger type without the user's knowledge. If such a result of adding two quantities would then be assigned back to the user's variable (e.g., quantity<ADC_lsb, int16_t>), the value would be silently truncated again. This is why I would prefer to force the user to fix the code by raising compile-time errors. With this, all the type conversions would be explicit in the code.

Does it make sense?

@mpusz
Copy link
Owner

mpusz commented Dec 14, 2024

Somewhat choosing between au and mp-units currently

@chiphogg, the primary author of Au, contributes to this project and is a co-author of the ISO proposal. We work with other experts to provide the best possible library for C++ standardization. If you see any Au features that are important for your domain, are not part of this project, or are not proposed in the ISO C++ paper, please let us know so we can learn about your use cases and improve the proposal.

@chiphogg
Copy link
Collaborator

Thanks for the callouts @mpusz!

The overflow problem in C++ units libraries is near and dear to my heart. I think this Overflow doc is the most accessible survey of the problem I've seen. Note that the comparison problem that you raised, @almania, is the central example in the article --- well spotted!

Here's a summary of my current thoughts on the matter.

  1. Supporting mixed-unit/rep comparisons is a huge boost to library usability, so I think we should keep supporting this if we can.
  2. The safety of these operations varies very widely based on the conversion factor and types involved. It's easy to generate conversions that are virtually guaranteed to overflow, without even being aware that a conversion is taking place at all. Therefore, providing this feature without some kind of mitigation strategy means that, IMO, integer types are not acceptably safe to use.
  3. The complexity of conversion factors and numeric types appears to be faithfully reducible to a single number, the "smallest overflowing value" for a particular conversion. We can choose a threshold value for this number. This is the "overflow safety surface", and it's been shown to work very well in practice for almost 4 years in production use.
  4. The overflow safety surface is still a heuristic, and will have both cases of allowing lossy conversions, and cases of preventing acceptable conversions. The best ideal to strive for is to check every value at runtime (assuming you have a story for "what to do if it's not OK", i.e., an error handling strategy).
  5. The most generic solution for "check every conversion" would be a runtime conversion checker. Let the units library do the heavy lifting of figuring out which inputs are lossy, and let the user use their preferred error handling mechanism. Au has this as of this past week's 0.4.0 release; I haven't seen it anywhere else. (That said, std::expected might be a better generic solution for libraries that are on C++23: the error state could contain a bitmask that could indicate whether the input value would overflow, truncate, or both.)
  6. Even if we have runtime conversion checkers, we still need a policy like the safety surface, because of the cases where the library, not the user, generates the conversion (such as comparison or addition).

So, basically: the ideal is to give the library an adaptive policy for which conversions are OK by default, and provide runtime-checked conversions for the cases where users are prepared to handle an error result.

@almania
Copy link
Author

almania commented Dec 16, 2024

This is why I would prefer to force the user to fix the code by raising compile-time errors. With this, all the type conversions would be explicit in the code.

I'm certain to have missed some of the complexity of it, but comparisons could be made 100% accurate without changing the rep could they not? At least for the same origin case:

operator== would return false if au::is_conversion_lossy<RHS>(lhs) would return true (no usage of common_type).
operator<=> would need a variant of au::will_conversion_overflow that returns the direction of the overflow, but I feel each case could be handled without casting to a common_type as well.

The difficulty in getting a generic implementation right does make it a good candidate for being a library feature I think, although whether it ought be the default behaviour or not is certainly a fair question.

I am not sure how you would like to make it work. Unfortunately, there is no global state you could mutate to change the behavior of the engine.

I believe what I meant may to an extent 'just work':

// define a type that does not natively allow integral equality comparisons:
namespace test_ops {
template <class T> struct test_type {};
template <class T, class U>
  requires(!std::integral<T> && !std::integral<U>)
bool operator==(test_type<T>, test_type<U>);
} // namespace test_ops

// and a namespace that extends it with support for integral equality
// comparisons:
namespace test_ops_allow_int_compare {
using namespace test_ops;
template <class T, class U>
  requires(std::integral<T> || std::integral<U>)
bool operator==(test_type<T>, test_type<U>);
} // namespace test_ops_allow_int_compare

namespace user_code {
using namespace test_ops_allow_int_compare;

// floating point test_types can be freely compared:
static_assert(std::equality_comparable<test_type<double>>);

// integral test_types are not equality_comparable, due the extension method not
// being visible by ADL:
static_assert(!std::equality_comparable<test_type<int>>);

// however equality operator will work anywhere that the extension namespace is
// in scope:
static_assert(requires(test_type<int> a) { a == a; });
} // namespace user_code

Although the utility/sense of it I'm unsure, but it's maybe food for thought as to what solutions/workarounds are available.

au has this as of this past week's 0.4.0 release;

Thank you for making me aware of this, I will have to try it out.

@JohelEGP
Copy link
Collaborator

This is why I would prefer to force the user to fix the code by raising compile-time errors. With this, all the type conversions would be explicit in the code.

I'm certain to have missed some of the complexity of it, but comparisons could be made 100% accurate without changing the rep could they not?

Security is a hot topic in programming industry right now, and the C++ committee is catching up.
Should this be a bigger topic in the proposal, with polls for LEWG or the security WG included?

For equality, the first thing you'd do is check whether the finer-grained value is a multiple of the coarser-grained one with a modulo.
Things start to get muddy once yo have program-defined types.
Do they define a numeric_limits?
A common_type with the other operand?
Does it actually represent all values of both operands
(not the case for integer types of equal width but different signedness)?

@chiphogg
Copy link
Collaborator

I'm certain to have missed some of the complexity of it, but comparisons could be made 100% accurate without changing the rep could they not? At least for the same origin case:

operator== would return false if au::is_conversion_lossy<RHS>(lhs) would return true (no usage of common_type). operator<=> would need a variant of au::will_conversion_overflow that returns the direction of the overflow, but I feel each case could be handled without casting to a common_type as well.

I think we would still want to use common units, regardless. We really want to avoid making op(a, b) different from op(b, a) for symmetric op, which certainly includes operator==.

I never thought of incorporating the runtime conversion checkers into the operator definitions. It's an interesting idea! That said, it seems likely to me that it would return false in cases where the actual answer is true, because computing that answer would require a lossy conversion.

@almania
Copy link
Author

almania commented Dec 16, 2024

That said, it seems likely to me that it would return false in cases where the actual answer is true, because computing that answer would require a lossy conversion.

I think currently the library does not allow lossy common_types (hence the integral scale factors), so comparison could be defined as if "performed on a common unit of infinite effective width". This is consistent with lossy conversions requiring explicit casts, so a comparison should not be introducing one. I think for equality of integral types not au::is_conversion_lossy<RHS>(lhs) && RHS(lhs).numerical_value == rhs.numerical_value produces that result, consistently symmetrically, iff au::is_conversion_lossy is bit-accurate for integral types. If a conversion is lossy at runtime, casting both to a common type of sufficient width will show the values to be unequal.

Mostly, I feel read_adc() < 5 * V should "just work" for least surprises to the user and for the utility it gains - plus the difficulty in writing these yourself. Even a naive correction of the code to read_adc() < decltype(read_adc())(5 * V) could give a less accurate answer, as that may be a potentially lossy/truncating conversion, so it's not easy for the user to workaround.

Do they define a numeric_limits?

I think anything gracefully handling overflow, be it safe conversion functions or arith/comparisons, at least that or other customization points would be strictly required to even be implementable really. Although as regularly comes up, we really need something more along the lines of is_losslessy_convertible_after_scaling<FromRep, ToRep>(integral_scalar)... unit libraries are very ambitious projects, I'll say that.

@almania
Copy link
Author

almania commented Dec 17, 2024

Thinking on it more, I think a partially specializable functor (pseudo):

// NB: shift may be positive or negative
template <typename FromRep, typename ToRep, std::uintmax_t Scale, int Shift>
struct safe_runtime_convert;

template <ToRep>
struct safe_runtime_convert_result {
  ToRep value; // modulo wrapped within ToRep bounds
  enum {
    none,
    below_min,
    above_max,
  } overflow;
  enum {
    exact,
    inexact,
  } truncation;
};

Would be about required to implement accurate type-to-type conversions and infinite-width-comparisons for user-defined-types (fixed-pt etc), along with utilities for different cast modes - but even that proto wouldn't allow a hypothetical round_to_even_cast<V>(value) (would require an additional bit in the truncation enum), nor super-exotic types.

This is maybe a whole kettle of fish/scope creep that is better avoided.

OTOH, I do believe integral type-to-type conversions need to be made accurate if integral types are to be supported at all, really - and achieving that (eg #615) would provide most if not all of the framework required for infinite-width comparisons as well. So if it's there... should it not be leveraged, at least for the built-in types?

Edit: apologies, misclicked close.

@almania almania closed this as completed Dec 17, 2024
@almania almania reopened this Dec 17, 2024
@mpusz
Copy link
Owner

mpusz commented Dec 17, 2024

Thanks for your input. I will have to think about it when I have more free time. Right now, I am working hard on quantity specification conversions and vector and complex quantities.

If someone would like to do some prototyping work on this, please feel free to jump in.

@burnpanck
Copy link
Contributor

burnpanck commented Dec 20, 2024

Uh oh, I missed this conversation. Please excuse if I repeat stuff that has been discussed before, I didn't have time yet to review all of it. As @chiphogg pointed out in one of the comments I spotted, the behaviour of conversion between units using integral representations with respect to overflow is surprising in the current implementation. I believe we should make clear guarantees about our conversions, basically along the lines that if a quantity is both within the range of the input and output representations, then the conversion should be correct and accurate roughly at least as good as the lower resolution of the two involved representations, and should not cause overflow or other undefined or implementation defined behaviour. This can be implemented using suitable fixed-point arithmetic as proposed in my PR #615, but the cost of it is that it requires double-wide internal computation (something that most hardware does provide, but the C/C++ standard doesn't expose), and it requires range and precision knowledge about the involved representations, for which there is currently no standard API for custom representations to implement.

@burnpanck
Copy link
Contributor

Maybe a little bit more about the guarantees and potential policy systems that would allow to tunes these guarantees. To start, let us analyse what type of failures we could guarantee against.

First, there are two different types of "severity":

  • "Overflow": A value falls completely outside of the acceptable range of a representation and/or operation
  • "Precision loss": A value cannot be exactly represented by a representation or in an operation, but a nearby value would be available

Then, there are also different "causes" for these failures:

  • "External": The range and/or precision in question is under the explicit and direct control of the library user, perhaps through an explicitly selected representation.
  • "Internal": The range and/or precision arises out of the implementation of a certain library functionality.

Finally, we may distinguish the "time of detection":

  • "Compile-time": Guarantees can usually only be made about some significant subset of values. The effect of a failure is a compilation failure.
  • "Runtime": Guarantees can be made for specific values. The effect of a runtime-failure requires a runtime error reporint mechanism.

Important guarantees in this set are:

  1. Compile-time guarantees for no internal overflow (only overflow for quantities outside the representable range of user-chosen representations)
  2. Compile-time guarantee for little-to-no internal precision loss, (per-operation precision loss of the order of the user-chosen representation's precision); this is what you usually get for floating point operations.
  3. Runtime guarantee against all precision loss, (computations do not alter the represented quantity in any way), usually not expected from floating-point operations.

My basic observation is that there is no sensible compile-time version of 3) that holds for mixed-unit operations; if you want to guarantee that operations are exact, you throw away bits, potentially loads of them depending on the involved units. How many bits depends in non-trivial ways on the involved units (e.g. ~25 bits for conversions between pounds and kilograms). Thus, every operation needs to increase the available representation bits to accommodate the "lossy" aspect of the conversion.

As pointed out by @almania, these details on what is needed are difficult to describe with generic numeric_limits, and because of that, #615 introduces scaling_traits<From,To>. The PR follows the stance of the current choice of this library, that builtin types never provide runtime guarantees in addition to the compile-time ones. However, 1) and 2) above are made. Furthermore, the scaling_traits<From,To>::scale(...) method allows runtime-checking to be performed by choice of the representation types, thus enabling a sort of policy selection through them.

@chiphogg
Copy link
Collaborator

I think currently the library does not allow lossy common_types (hence the integral scale factors), so comparison could be defined as if "performed on a common unit of infinite effective width". This is consistent with lossy conversions requiring explicit casts, so a comparison should not be introducing one. I think for equality of integral types not au::is_conversion_lossy<RHS>(lhs) && RHS(lhs).numerical_value == rhs.numerical_value produces that result, consistently symmetrically, iff au::is_conversion_lossy is bit-accurate for integral types. If a conversion is lossy at runtime, casting both to a common type of sufficient width will show the values to be unequal.

I just realized another reason I have a strong intuitive preference to implement operator== as a common-unit operation, converting both input units, and not as an asymmetrical conversion to the second (or first) unit.

Users who use operator== will generally want to use other comparisons, such as operator< or operator>=, in similar circumstances. With common units, this is no problem: they will work and not work in the same situations, and "not working" manifests as a hard compiler error, and is thus hard to miss. By contrast, the asymmetric strategy, even if it works for operator==, would fail for operator<: you really do need to be able to get both quantities into the same unit-and-rep to compare them. I don't think it's acceptable design for the set of valid inputs to change from operator== to operator<.

I did want to call out that the suggestion was very creative, and not something I think I could have come up with on my own. For integer types, you're probably right that it could give the right answer in every case; at least, I can't come up with a counter-example. I just don't think it's the right choice when we look at the holistic picture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design-related discussion enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants