-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve preprocessing performance #10534
base: master
Are you sure you want to change the base?
Conversation
@@ -614,8 +614,11 @@ generateCode | |||
-> Verbosity | |||
-> IO (SymbolicPath Pkg (Dir Source), [ModuleName.ModuleName]) | |||
generateCode codeGens nm pdesc bi lbi clbi verbosity = do | |||
debug verbosity $ "generateCode: " <> prettyShow (package pdesc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of extra debug
statements here to help me identify where we were losing time. I can remove those if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can you remove these, I think you can see in the cost centre profile there is a hot-spot in findCwdWithExtension
, which is also what you identified.
@@ -292,10 +300,11 @@ preprocessFile | |||
-- ^ fail on missing file | |||
-> IO () | |||
preprocessFile mbWorkDir searchLoc buildLoc forSDist baseFile verbosity builtinSuffixes handlers failOnMissing = do | |||
debug verbosity $ "preprocessFile: " <> prettyShow baseFile | |||
bsrcFiles <- findFileCwdWithExtension mbWorkDir builtinSuffixes (searchLoc ++ [buildAsSrcLoc]) baseFile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, we try and find a regular Haskell file.
We also add buildAsSrcLoc]
to the end of the list, since the locations are tried in-order. Most likely, we're going to be in the first directory we're searching in, so this should eliminate a lot of file lookups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is a change in behaviour? Before, if you had a .hs
and a .y
file in the directory, would it prefer to regenerate the .hs
from the .y
file?
It seems that it would then prefer the generated .y
file rather than the .hs
file in the directory, because it searched the build directory first?
Could we have some tests to check this behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question - what is the expected behavior of having multiple files with different extensions in directory? Rather, what should happen if we have a directory project like this:
src/Foo.hs
src/Foo.y
The old code would ignore src/Foo.hs
in favor of src/Foo.y
. The new code ignores src/Foo.y
in favor of src/Foo.hs
.
This is definitely a change in behavior, but it would require a project to be poorly structured - somehow having Foo.hs
and Foo.y
together, with the expectation that Foo.hs
is never used.
b6ae84f
to
e7f4251
Compare
This PR changes the behavior of the
preprocessFile
logic. Currently, it attempts to find a file matching a module name for every known preprocessor, before attempting to find a file with a plain Haskell suffix. The overwhelmingly common case should be a plain Haskell file. This results in a ton of extraneous lookups.On Mercury's package with 11k modules, the existing
preprocessFile
takes around 1.41s according to the--verbosity="debug +timestamps"
output. With this patch, we're at 0.59s.Curiously, the impact is greater in situ - once
configure
is out of the way, plaincabal
takes about 5s to do a no-op ghci eval, while this patch takes about 3.1s, almost 2s faster. Given that about 2 of the remaining seconds are actually running GHCi, that may be pretty close to optimal.Please read Github PR Conventions and then fill in one of these two templates.
Template Α: This PR modifies behaviour or interface
Include the following checklist in your PR:
Template B: This PR does not modify behaviour or interface
E.g. the PR only touches documentation or tests, does refactorings, etc.
Include the following checklist in your PR: