Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extra-source-files glob behavior is very slow #10495

Open
parsonsmatt opened this issue Oct 30, 2024 · 1 comment
Open

extra-source-files glob behavior is very slow #10495

parsonsmatt opened this issue Oct 30, 2024 · 1 comment

Comments

@parsonsmatt
Copy link
Collaborator

parsonsmatt commented Oct 30, 2024

While diagnosing the very slow boot times in cabal repl for our codebase, I identified that a very large amount of time is being spent checking globs for extra-source-files. The log line that is emitted is here.

I am actually having a really difficult time tracing the flow of this code, so I apologize if I've gone down some false paths or thrown out some red herrings.

When cabal is checking the extra-source-files, it is doing so in a single threaded manner and is also interpreting each line as a glob.

Locally, I am observing about a 7-10ms time per entry - if I run cabal repl --verbose="debug +timestamp", then I get this partial output:

1730308340.870 Expanding glob
               'config/modelsFiles/abstract_category_mapping_rule.persistentmodels' in
               directory '.'.
1730308340.875 Expanding glob
               'config/modelsFiles/abstract_merchant_mapping_rule.persistentmodels' in
               directory '.'.

...

1730308373.563 Expanding glob '.prettierrc' in directory './.'.

Per the timestamps, we're spending ~5-10ms in each glob, with a total elapsed time of 33 seconds (!!). There are ~1800 extra-source-files in our project, and ~10ms per would still only be 18s. Looking at the logs, they're each mentioned four times:

1730308346.904 Expanding glob 'config/modelsFiles/organization.persistentmodels' in directory
1730308354.357 Expanding glob 'config/modelsFiles/organization.persistentmodels' in directory
1730308363.024 Expanding glob 'config/modelsFiles/organization.persistentmodels' in directory
1730308370.972 Expanding glob 'config/modelsFiles/organization.persistentmodels' in directory

All Expanding glob are called in a block. The line immediately preceding the glob expansions is:

1730308340.639 Finalized package description:

And the message immediately after is:

1730308374.597 Dependency EventLog ==0.2.0.0: using EventLog-0.2.0.0
               Dependency HUnit ==1.6.2.0: using HUnit-1.6.2.0
               Dependency HaXml ==1.25.13: using HaXml-1.25.13

I need to put this down for right now, but I'll continue writing investigation notes here.

EDIT:

OK, I've identified that we're actually expanding each glob line in extra-source-files four times. Between only expanding the globs once and a fast-track for non-glob lines, I think this should be fixed up nicely.

@parsonsmatt
Copy link
Collaborator Author

There's one more PR that could be done here: even with my #10516 PR, we're still redundantly checking every extra-source-files line twice. The checkMissingDocs function accepts four [Glob] and then checks each glob. However, at this point in check, we have already checked these globs: so factoring the result from [Glob] into a type that already reflects the work done would fix this redundant lookup.

This would bring our runtime from 34s to 37.5ms. The other two patches already got us to 75ms, but I already identified the fruit to be picked, so I may go ahead and do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant