Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I discovered more ways that casting can cause problems in seemingly innocuous code. In particular, if you ask Tapir to differentiate
at
p = 2.0
, the answer is currently zero. I was amazed that this works at all until I realised that Julia will convertp
to anInt
when it tries to write it tobuf_view
if it's the case thatp
happens to be integer valued. If this happens, gradients get dropped and the wrong answer is given.As part of this PR, I went back over the intrinsics involved in casting to check if there are any more, and I don't believe there are any more which risk causing problems -- hopefully that will prove to be the case.
edit: I've converted this to a WIP because it's going to take a little bit of time to figure out what's going on in all of the cases where
fptosi
andfptoui
are used in actually innocuous code that doesn't result in dropped gradient info. For the most part, it's just going to be declaring things non-differentiable, but there might be a couple of tricky cases.This may also motivate a more general approach to this in which we have a macro / trait which can be applied to methods of functions which asserts that it's fine to "drop" gradients for all code inside the method, as we're confident that it's not doing it in a way which risks giving the wrong answer, but that they should otherwise be differentiated as usual. Writing the macro / trait / whatever winds up being a convenient approach to this ought really to be straightforward.