Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use a fine-grained write-around cache if no dep tracking
Summary: **Idea** We discovered in D58897396 that storing FunctionDefinition.t values separately from ModuleComponents.t is helpful for performance; it turns a small *loss* on this stack in Pyre into a small *win* and it cuts the Pysa slowdown by a big factor. But the Pysa perf is still not good; the RAM use is much higher than before the stack, and the Callgraph computation is taking much longer (presumably because it pushes into swap). This diff attempts to mitigate the problem with a work-around: because we know that Pysa only ever operates in a non-dependency-tracked no-overlay setting, we can use global and non-dependency tracked shared memory tables to power Pysa (and in fact to power pyre check as well!). This lets us use a simple write-around setup that is very similar to the old UGE setup except that: - it is much simpler because no incremental updates are needed - there is much less code because we're only dealing with function definitions and none of the other module components As evidence that it's simpler, I'll note that this diff adds about 120 lines of extra complexity, whereas D58842854 removed about 1000 lines, so this little hack is probably ~10x easier to maintain. **Results** *Abstractly, what's expected* There's no change at all for incremental Pyre, although we don't have good CI perf tests for that anyway. But for non-incremental commands, which includes `pyre check`, `arc pyre check`, and `pyre analyze`, we do have a change: The extra RAM use caused by function definitions specifically that was introduced in D58842854 should now be 100% eliminated, although extra RAM use from loading entire module signatures (which in the case of enormous classes, such as 100K-entry enums or generated classes with hundreds of methods, can be substantial) remains. *Actual results on Pyre* There's no discernable change in servicelab. This suggests that just breaking out FunctionDefinitionEnvironment in D58897396 was enough to eliminate memory pressure and serialization of function bodies as a major factor in Pyre; it's likely that the map reduce for type check already had pretty good module-level locality (as in functions from the same module tended to wind up on the same worker). *Actual results for Pysa* This diff does seem to substantially improve on D58897396 (which already improved on D58897396). In particular callgraph construction, where the hit was coming from, drops from 55 minutes to 37 minutes; this is still much longer than the 13 minutes on trunk but it is a big help and the overall slowdown is only 22 minutes now. From the memory explorer results... - on base: https://fburl.com/unidash/huvaaaqz - on diff: https://fburl.com/unidash/x6928gs7 ...it still seems clear that there is memory pressure, we still spike to about 57G instead of 47G which suggests we're probably still pushing into swap, driving the callgraph slowdown, but I think we're swapping *less* now. **What else could we do for Pysa?** *Shrink the worker pool for callgraph* On trunk, callgraph construction is only around 13 minutes; on this diff it is 37. If we were to halve the number of workers, we might eliminate the memory pressure entirely and come out with something like 25-30 minutes. This is probably the simplest fix assuming it works, it trades RAM use for wall time in a predictable way that is also easy to adjust going forward if codebases continue to grow. *Find and fix the cause of UGE "bigness"* The results of D58842854 + D58897396 + D58905530 strongly suggests that for some reason callgraph RAM use is actually being driven by UGE memory (without function definitions). This is actually surprising, since in most cases post-D58897396 the UGE data would be relatively small, it's just unannotated globals + class summaries. The underlying problem is likely some combination of - huge literal values in globals; we know that giant dicts and lists are not actually that unusual, some codebases put configurations into module globals - huge class summaries, of which the most likely culprits are: - fat enums (we know the biggest ones are at least close to 100K members) - generated classes with hundreds of methods; if there are enough of these in generated codebases it could add up. If we identify the problem, we could consider approaches to eliminate it, for example setting size limits on unannotated global literals (and treating bigger values as if they were a literal `...`). This is likely something we should eventually do regardless, but I'm not sure whether we want to dive into this kind of optimization on the Pyre team right now given that the pyre perf is fine at the moment and we really want to focus on 3.12, qualification, and conformance. *Mostly rejected: do the same atomization for UGE* One option for this diff is to try to perform the same change for UGE. I attempted this in D58908824, but there are some problems: - the table structure is significantly more complex - probably as a result, it looks like my code is buggy; there are type checking errors - even if I fix it, it's a bad change as anything more than a band-aid because the point of D58842854 is to allow us to start refactoring UGE so that the module components can depend on one another, and having to atomize them for Pysa means we can't actually do that. - as a result, I'm not sold on even doing the work to debug D58908824; why bother if we know it won't help long-term? Note that I don't consider *this* diff to be a band-aid because it's actually ok to atomize the function bodies if we analyze them one at a time. Atomizing the signatures is a much bigger problem for pyre architecture. Reviewed By: migeed-z Differential Revision: D58905530 fbshipit-source-id: b65e55f1f782c2c74c7b03129677e097054f5b1c
- Loading branch information