Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect fmt::Pointer implementations #328

Closed
wants to merge 3 commits into from
Closed

Conversation

tyranron
Copy link
Collaborator

@tyranron tyranron commented Dec 23, 2023

Synopsis

Debug and Display derives allow referring fields via short syntax (_0 for unnamed fields and name for named fields):

#[derive(Display)]
#[display("{_0:o}")]
struct OctalInt(i32);

The way this works is by introducing a local binding in the macro expansion:

let _0 = &self.0;

This, however, introduces double pointer indirection. For most of the fmt traits, this is totally OK. However, the fmt::Pointer is sensitive to that:

#[derive(Display)]
#[display("--> {_0:p}")]
struct Int(&'static i32);

// expands to
impl fmt::Display for Int {
    fn fmt(&self, f: fmt::Formatter<'_>) -> fmt::Result {
        let _0 = &self.0; // has `&&i32` type, not `&i32`
        write!(f, "--> {_0:p}") // so, prints address of the `_0` local binding, 
                                // not the one of the `self.0` field as we would expect
    }
}

Solution

Instead of introducing local bindings, try to replace this shortcuts (like _0) in idents and expressions with a field accessing syntax (like self.0). Or event just substitute _0 with (*_0), because for enum variants we cannot really access the fields via self..

Checklist

  • Documentation is updated (not required)
  • Tests are added/updated
  • CHANGELOG entry is added

@tyranron tyranron added this to the 1.0.0 milestone Dec 23, 2023
@tyranron tyranron self-assigned this Dec 23, 2023
@JelteF
Copy link
Owner

JelteF commented Feb 20, 2024

@tyranron are you still planinng on addressing this?

@JelteF
Copy link
Owner

JelteF commented Mar 11, 2024

@tyranron ping

@tyranron
Copy link
Collaborator Author

@JelteF yes, sorry for not being responsive. I was a little bit overwhelmed on my job last 2 months. I'll try to finish this in next 2 weeks or so.

@JelteF
Copy link
Owner

JelteF commented Mar 11, 2024

No worries, it happens. "Next 2 weeks or so" sounds amazing.

@tyranron tyranron mentioned this pull request Apr 4, 2024
@tyranron
Copy link
Collaborator Author

tyranron commented Apr 5, 2024

Oh... crap. This is a bit of a rabbit hole.

I've figured out the solution for cases like #[display("{_0:p}")] or #[display("{:p}", _0)]. But cannot figure out the solution for complex expressions like #[display("{}", format!("{_0:p}"))], so at the moment #[display("{_0:p}")] and #[display("{}", format!("{_0:p}"))] won't produce the same result, and I cannot get how to solve it without going into deep parsing of expressions, which is non-trivial and will require full feature of syn.

We basically hit the difference between accessing a binding and a self.field Rust semantics. We want a binding to always behave like a self.field, but that's not possible in Rust without moving out. And replacing a binding with a *binding in arbitrary expressions seems non-trivial, if solvable in general case at all.

@JelteF I think that supporting #[display("{_0:p}")] and #[display("{:p}", _0)] cases is enough. For more complex situations like #[display("{}", format!("{_0:p}"))] it's better to describe field accessing semantics better in docs and warn explicitly about possible pointer arithmetic or formatting implications.

@tyranron
Copy link
Collaborator Author

tyranron commented Apr 8, 2024

@JelteF thinking about it more, I'm unsure whether we should fix #[display("{_0:p}")]/#[display("{:p}", _0)] at all, since we cannot fix it in all the cases. Because having different behavior with #[display("{}", format!("{_0:p}"))] is just too subtle.

Instead, I propose to leave desugaring as it is now, and:

  1. Describe in docs clearly that _0 and field_name has pointer semantics, not value semantics, giving a clear example of how {:p} formatting should be used correctly.
  2. For trivial detectable cases like #[display("{_0:p}")]/#[display("{:p}", _0)] produce a compilation error, guiding towards correct usage: #[display("{:p}", *_0)].

This way, throwing the compile-time error, we bring the library user's attention to this edge case, so he will much more aware about the implications for the #[display("{}", format!("{_0:p}"))] case, rather than silently introducing a different behavior (being the wrong one) without warning the library user about this in any way.

@JelteF
Copy link
Owner

JelteF commented Apr 8, 2024

tl;dr I like your previous proposal much better.

Honestly, I cannot think of a reasonable case where anyone would do something like this: #[display("{}", format!("{_0:p}"))]. So I think optimizing for that case seems a bit strange. Because that means not being able to use #[display("{_0:p}")], which seems a much more common situation.

I think it makes sense to document the weirdness around this, but I think we should support #[display("{_0:p}")] and #[display("{:p}", _0)].

@yfcai yfcai mentioned this pull request May 8, 2024
3 tasks
@tyranron
Copy link
Collaborator Author

tyranron commented May 10, 2024

@JelteF by the way, despite me being too busy to get on this lately, it appears for the better... today, when I was dreaming, I figured out the solution I do like and that should work in all the cases:

// We declare in `derive_more` additional types machinery:
#[repr(transparent)]
pub struct PtrWrapper(T);

// We do the necessecarry dereference in the `fmt::Pointer` impl for our `PtrWrapper`
impl<T: fmt::Pointer> fmt::Pointer for PtrWrapper(&T) {
    fn fmt(&self, f: fmt::Writer<'_>) -> fmt::Result {
        (*self.0).fmt(f)
    }
}

// For any other `fmt` trait we just do the transparent implementation.
impl<T: fmt::Debug> fmt::Debug for PtrWrapper(T) {
    fn fmt(&self, f: fmt::Writer<'_>) -> fmt::Result {
        self.0.fmt(f)
    }
}

// And also `Deref`+`DerefMut`.

// And in the macro expansion we do this:
let _0 = ::derive_more::fmt::PtrWrapper(&self.0);
// instead of this:
let _0 = &self.0;
// so here, wherever the `_0` value is engaged, it will be dereferenced once used as `{:p}`
write!(f, "--> {_0:p}")
// and any other cases should work just fine due to transparent impls of `fmt` trait and `Deref`/`DerefMut`.

@JelteF
Copy link
Owner

JelteF commented May 17, 2024

@tyranron that sounds like a great solution. And I think you wouldn't even need to implement fmt::Debug (and the other non fmt::Pointer traits in fmt), as long as you implement Deref for this type.

@JelteF
Copy link
Owner

JelteF commented Jul 1, 2024

@JelteF by the way, despite me being too busy to get on this lately, it appears for the better... today, when I was dreaming, I figured out the solution I do like and that should work in all the cases:

Okay I tried this out just now, and it has an unsolvable problem afaict. By wrapping the reference in PtrWrapper it loses all trait implementations that are automatically implemented on &T (of which there are many). We can implement all the fmt traits ofcourse like you proposed. But that only works for those traits. If the user uses the variable in a way that doesn't cause an auto-deref then this won't help.

So I think the only reasonable solution is to special-case #[display("{_0:p}")] and #[display("{:p}", _0)] and document the strangeness for this thing.

@tyranron
Copy link
Collaborator Author

tyranron commented Jul 1, 2024

@JelteF

If the user uses the variable in a way that doesn't cause an auto-deref then this won't help.

So I think the only reasonable solution is to special-case #[display("{_0:p}")] and #[display("{:p}", _0)] and document the strangeness for this thing.

For me, this situation seems to be less confusing than special-casing, because auto-deref should work for the majority of cases, while the special-casing breaks whenever there is any expression. Also, wrapper type doesn't allow the code to be incorrect in a subtle way, like the special-casing does.

@JelteF
Copy link
Owner

JelteF commented Jul 1, 2024

For me, this situation seems to be less confusing than special-casing, because auto-deref should work for the majority of cases

I think a fairly common case that would be much more difficult is doing some arithmetic with one of the arguments:

use ::core;
use core::fmt::{Debug, Formatter, Result, Write, Pointer};
use core::prelude::v1::*;


#[repr(transparent)]
pub struct PtrWrapper<T> (pub T);

impl<T: Pointer> Pointer for PtrWrapper<&T> {
    fn fmt(&self, f: &mut Formatter<'_>) -> Result {
        (*self.0).fmt(f)
    }
}

impl<T> ::core::ops::Deref for PtrWrapper<&T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        self.0
    }
}

fn main() {
    let a : i32 = 123;
    let a = PtrWrapper(&a);

    println!("{:?}", a * 2)
}

Instead of that "just working" you now get the following error, which doesn't even give the hint that dereferencing might help:

error[E0369]: cannot multiply `PtrWrapper<&i32>` by `{integer}`
   --> src/main.rs:27:24
    |
27  |     println!("{:?}", a * 2)
    |                      - ^ - {integer}
    |                      |
    |                      PtrWrapper<&i32>
    |
note: an implementation of `Mul<{integer}>` might be missing for `PtrWrapper<&i32>`
   --> src/main.rs:7:1
    |
7   | pub struct PtrWrapper<T> (pub T);
    | ^^^^^^^^^^^^^^^^^^^^^^^^ must implement `Mul<{integer}>`
note: the trait `Mul` must be implemented
   --> /playground/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/arith.rs:317:1
    |
317 | pub trait Mul<Rhs = Self> {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^

while the special-casing breaks whenever there is any expression.

It really only impacts the Pointer formatter, everything else is not impacted. I think using the Pointer formatter is rare enough that we shouldn't make the ergonomics for other formatters worse.

Also, wrapper type doesn't allow the code to be incorrect in a subtle way, like the special-casing does.

That is true, but again. This only impacts the Pointer formatter. And really I cannot think of any reason to use the Pointer formatter for anything else than one of the fields, or self. So if we special case those, then I don't think anyone will ever get subtly incorrect code. Also worst case, they print the wrong pointer, which is unlikely to be a life-or-death mistake.

tyranron pushed a commit that referenced this pull request Jul 4, 2024
## Synopsis

While looking into #328 I realized the current situation around Pointer
derives
and references was even weirder because we store a double-reference to
the
fields in the local variables for Debug, but not for Display. The reason
we
were doing this was because of #289.


## Solution

This stops storing a double-reference, and only adds the additional
reference
in the places where its needed.
@JelteF
Copy link
Owner

JelteF commented Jul 4, 2024

Closing this one in favor if #381

@tyranron
Copy link
Collaborator Author

@JelteF sorry for bringing this up again, but I have more thoughts about it.

This only impacts the Pointer formatter.

We could try leveraging this fact.

Does the following algo makes sense?

  1. We detect all the placeholders using :p, which is trivial.
  2. We wrap all the expressions going into these placeholders with PtrWrapper:
    • for positional arguments (like ("{:p}", whathever)) this is trivial by expanding to ("{:p}", PtrWrapper(whathever))
    • for named arguments outside the formatting expression (like ("{whathever:p}")) this is a bit trickier, because we need to introduce an additional binding in the scope like let _p_whathever = PtrWrapper(whatever); and substitute the name in the expansion to it ("{_p_whathever:p}").

This way, we use the PtrWrapper in a fine-grained way only for the fmt::Pointer formatting, without involving it into operations with other traits at all. Any complex expression (for any reason the caller might want it for :p formatting) goes inside the PtrWrapper() anyway, so we do not mess up its semantic in any way.

@JelteF
Copy link
Owner

JelteF commented Jul 25, 2024

Okay I played around with that idea a bit more, but I cannot get it to do the thing you say it should do for anything but the most basic expression.

The following for instance fails to compile, because there's now a PtrWrappper around the *const i32 type.

use std::fmt;

#[repr(transparent)]
pub struct PtrWrapper<T> (pub T);

impl<T: fmt::Pointer> fmt::Pointer for PtrWrapper<&T> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        (*self.0).fmt(f)
    }
}

impl<T> ::core::ops::Deref for PtrWrapper<&T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        self.0
    }
}

fn main() {
    // a is the field
    let a: &i32 = &123;
    let a = &a; // this is our local variable
    println!("{:p}, {:p}", a, unsafe { (*a as *const i32).add(1) });
    println!("{:p}, {:p}", PtrWrapper(a), PtrWrapper(unsafe { (*a as *const i32).add(1) }));
}

the error:

error[E0277]: the trait bound `PtrWrapper<*const i32>: Pointer` is not satisfied
   --> src/main.rs:25:43
    |
25  |     println!("{:p}, {:p}", PtrWrapper(a), PtrWrapper(unsafe { (*a as *const i32).add(1) }));
    |                     ----                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `Pointer` is not implemented for `PtrWrapper<*const i32>`
    |                     |
    |                     required by a bound introduced by this call
    |
    = help: the trait `Pointer` is implemented for `PtrWrapper<&T>`
note: required by a bound in `core::fmt::rt::Argument::<'a>::new_pointer`
   --> /playground/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/rt.rs:132:31
    |
132 |     pub fn new_pointer<'b, T: Pointer>(x: &'b T) -> Argument<'_> {
    |                               ^^^^^^^ required by this bound in `Argument::<'a>::new_pointer`
    = note: this error originates in the macro `$crate::format_args_nl` which comes from the expansion of the macro `println` (in Nightly builds, run with -Z macro-backtrace for more info)

Unless you can get this working, I don't really see the point of adding this. The downsides it brings (stuff not compiling and being more complicated to explain) seem worse than the benefits (not having to put a * in front of the variable).

@JelteF
Copy link
Owner

JelteF commented Jul 25, 2024

for named arguments outside the formatting expression (like ("{whathever:p}")) this is a bit trickier, because we need to introduce an additional binding in the scope like let _p_whathever = PtrWrapper(whatever); and substitute the name in the expansion to it ("{_p_whathever:p}").

For named arguments there would be no need to use PtrWrapper afaict, we can simply use the approach I used in #381

@JelteF
Copy link
Owner

JelteF commented Jul 26, 2024

Finally, I even if you can get PtrWrapper to actually work. I think it's easier to explain to people that in the expressions the field names are references to the actual fields and that thus to get to the underlying pointer you have to dereference. If they only have to that for complex expressions (like my unsafe + cast to pointer), but not for simple expressions then I think it's just going to be confusing.

@tyranron
Copy link
Collaborator Author

@JelteF

If they only have to that for complex expressions (like my unsafe + cast to pointer), but not for simple expressions then I think it's just going to be confusing.

Yeah... still imperfect. In complex expressions, better people realize they're working with a reference rather than a direct value. Trying to hide that fact will lead to very subtle bugs if we cannot strip out that additional indirection in all the cases. And we cannot, because we cannot overload "reading the value" in Rust in any way.

tyranron added a commit that referenced this pull request Jul 31, 2024
Resolves #328
Requires #377
Requires #380 

## Synopsis

`Debug` and `Display` derives allow referring fields via short syntax
(`_0` for unnamed fields and `name` for named fields):
```rust
#[derive(Display)]
#[display("{_0:o}")]
struct OctalInt(i32);
```
The way this works is by introducing a local binding in the macro
expansion:
```rust
let _0 = &self.0;
```

This, however, introduces double pointer indirection. For most of the
`fmt` traits, this is totally OK. However, the `fmt::Pointer` is
sensitive to that:
```rust
#[derive(Display)]
#[display("--> {_0:p}")]
struct Int(&'static i32);

// expands to
impl fmt::Display for Int {
    fn fmt(&self, f: fmt::Formatter<'_>) -> fmt::Result {
        let _0 = &self.0; // has `&&i32` type, not `&i32`
        write!(f, "--> {_0:p}") // so, prints address of the `_0` local binding, 
                                // not the one of the `self.0` field as we would expect
    }
}
```


## Solution

Pass all local bindings also as named parameters and dereference them
there.
This allows `"{_0:p}"` to work as expected. Positional arguments and
expressions
still have the previous behaviour. This seems okay IMHO, as we can
explain
that in expressions these local bindings are references and that you
need
to dereference when needed, such as for `Pointer`.

A downside of the current implementation is that users cannot use the
names of our named parameters as names for their own named parameters,
because we already use them. With some additional code this is fixable,
but it doesn't seem important enough to fix. People can simply use a
different name when creating their own named parameters, which is a good
idea anyway because it will be less confusing to any reader of the code.
If it turns out to be important to support this after all, we can still
start to support it in a backwards compatible way (because now it causes
a compilation failure).

Co-authored-by: Kai Ren <tyranron@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants