Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove show for Type{<:Pullback} #1356

Merged
merged 1 commit into from
Feb 26, 2023
Merged

Remove show for Type{<:Pullback} #1356

merged 1 commit into from
Feb 26, 2023

Conversation

devmotion
Copy link
Collaborator

Alternative to #1354 by removing a show method for Type{<:Pullback}. I see an even more drastic improvement than reported in #1354:

master branch

julia> using SnoopCompile

julia> invalidations = @snoopr using Zygote;

julia> length(uinvalidated(invalidations))
1542

julia> trees = invalidation_trees(invalidations);

julia> trees[end].backedges[end]
MethodInstance for show(::IOBuffer, ::Type) at depth 0 with 1246 children
julia> @time using Zygote
  1.327357 seconds (4.23 M allocations: 299.507 MiB, 4.29% gc time, 19.41% compilation time: 85% of which was recompilation)

this PR

julia> using SnoopCompile

julia> invalidations = @snoopr using Zygote;

julia> length(uinvalidated(invalidations))
446

julia> trees = invalidation_trees(invalidations);

julia> trees[end].backedges[end]
MethodInstance for promote_rule(::Type{Int64}, ::Type) at depth 0 with 222 children
julia> @time using Zygote
  1.250796 seconds (4.11 M allocations: 293.619 MiB, 1.36% gc time, 18.60% compilation time: 84% of which was recompilation)

@devmotion
Copy link
Collaborator Author

@devmotion
Copy link
Collaborator Author

BTW this addresses one part of #877.

@simsurace
Copy link
Contributor

This would be nice if it were merged!

@ToucheSir
Copy link
Member

ToucheSir commented Jan 26, 2023

After seeing enough outputs like https://github.com/FluxML/Metalhead.jl/actions/runs/3800957452/jobs/6466762142#step:7:127, I'm afraid I have to agree with @willtebbutt's first point in #1354 (comment). Note that Zygote would have it even worse because Pullback chains mean long types would be replicated across multiple stack frames. It's fair to say JuliaLang/julia#45687 (comment) or something else which we can use to not show the T in Pullback{S,T} is a prereq for getting these latency-reducing PRs merged.

@devmotion
Copy link
Collaborator Author

I'm still not sure what's the main problem with verbose stacktraces - is it that it is incomprehensible for users? I assume for "regular" users actually it might not matter since I guess neither compact nor verbose stacktraces involving Zygote are easy to decipher. We could even point users to https://github.com/BioTurboNick/AbbreviatedStackTraces.jl. Or just load it in the CI if the main problem are long logs? However, for debugging purposes and bug reporting it could even be an advantage if the output is more verbose.

@ToucheSir
Copy link
Member

We could even point users to https://github.com/BioTurboNick/AbbreviatedStackTraces.jl. Or just load it in the CI if the main problem are long logs? However, for debugging purposes and bug reporting it could even be an advantage if the output is more verbose.

They're large enough that they go beyond useful for debugging and around the other end—namely that the signal-to-noise ratio is so low that they often hinder debugging. That and they're more liable to blow past any line limits set by terminals, CI providers, etc, so there's also the risk of losing important output before the stacktrace.

RE configurability, that's what I was getting at with using Preferences.jl. AbbreviatedStacktraces doesn't really help IIUC because it eliminates entire stack frames instead of abbreviating types?

@devmotion
Copy link
Collaborator Author

I agree that probably in many cases the full types are more verbose than needed for debugging - but it definitely happened to me multiple times in the past when debugging AD issues in DistributionsAD that - due to the show for Pullback types that is removed here - the output was not verbose enough for the logs to be helpful. Something in-between both extremes would be nice IMO - but as I mentioned above I don't think Preferences.jl is a solution to the problem: If you would disable the show methods by default, latency, type piracy, invaldiations etc. would be fixed but stacktraces would be very verbose which you seem to dislike; on the other hand, if the show methods are enabled by default, stacktraces would be more compact but the latency, type piracy, invalidations etc. would not be fixed. Since I assume that most users don't tweak these settings, I guess going with the second alternative (show enabled by default) would mean that most users would not get any latency etc. fixes.

From the README of AbbreviatedStacktraces:

I think clearly these sources can be assumed to not be your code, and more likely stable:

Julia Base
Julia Stdlibs
Dependencies acquired using Pkg add

Whereas these sources are "yours":

Activated local package
Code defined in the REPL
Dependencies acquired using Pkg dev
Modules or files matching ENV["JULIA_DEBUG"] (file basename and/or module names; comma-separated, ! to exclude)

All frames originating from "your" code are shown by default, as well as the immediate next frame to show what function your code called. This information should be sufficient in most cases to understand that you made a mistake, and where that mistake was located.

But in the rarer case where the issue was not in your code, the full trace can be retrieved from the err global variable.

So to me it seems the problem would be fixed for end users since, unless they work on Zygote directly, they would not see the part of the stacktraces that's caused by Zygote internals and hence the Pullback types should not show up in their stacktraces (as far as I understand). There are also additional options (https://github.com/BioTurboNick/AbbreviatedStackTraces.jl#options) that could be useful for CI.

@ToucheSir
Copy link
Member

ToucheSir commented Jan 27, 2023

Something in-between both extremes would be nice IMO

Were load latency not a problem, that would be my suggestion. Basically prototype that Julia issue by hiding the uninformative second type parameter, which happens to cause the vast majority of the output bloat.

I don't think Preferences.jl is a solution to the problem: If you would disable the show methods by default, latency, type piracy, invaldiations etc. would be fixed but stacktraces would be very verbose which you seem to dislike; on the other hand, if the show methods are enabled by default, stacktraces would be more compact but the latency, type piracy, invalidations etc. would not be fixed.

I would really recommend taking some nested Turing model or similar and seeing how the stacktraces look with these show methods removed. It's pretty clear from the Flux side that such sizes are untenable. Also, we ought not to treat this as the only way to reduce load latency for Zygote. Surely there are other avenues which do not come with such a large tradeoff.

So to me it seems the problem would be fixed for end users since, unless they work on Zygote directly, they would not see the part of the stacktraces that's caused by Zygote internals and hence the Pullback types should not show up in their stacktraces (as far as I understand). There are also additional options (https://github.com/BioTurboNick/AbbreviatedStackTraces.jl#options) that could be useful for CI.

But this falls apart the moment they want to ask for help or report an issue. Likewise we want to see those stack frames on CI, just not the whole type along with them. I couldn't find any mention of an option in AbbreviatedStackTraces to just summarize types without removing frames, so it doesn't seem all that useful here.

@devmotion
Copy link
Collaborator Author

Also, we ought not to treat this as the only way to reduce load latency for Zygote. Surely there are other avenues which do not come with such a large tradeoff.

Regarding invalidations at least, this show method seems to be the main (only?) problem in Zygote: #877 I couldn't reproduce the second source mentioned in the issue but I'm not an expert with SnoopCompile, so maybe it still exists.

@devmotion
Copy link
Collaborator Author

I guess if the stacktraces are a blocker for the invalidation fixes, we have to wait for JuliaLang/julia#48444.

Copy link
Member

@ToucheSir ToucheSir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving and merging this now because it turns out the custom type show method is causing the last failure on nightly CI (i.e. actual bugs). However, this just reinforces the idea that Base should either provide a more robust, extensible API for type printing or mechanisms to cut down on type parameter explosion in show/display.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants