Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make subprocess command logging atomic #1917

Open
rmartin16 opened this issue Jul 15, 2024 · 3 comments
Open

Make subprocess command logging atomic #1917

rmartin16 opened this issue Jul 15, 2024 · 3 comments
Labels
enhancement New features, or improvements to existing features.

Comments

@rmartin16
Copy link
Member

What is the problem or limitation you are having?

Each time a subprocess command is executed, Subprocess logs its command line, environment, return code, stdout/stderr, etc. However, these operations are independently serial; therefore, when subprocess calls are executed concurrently, the logging of different commands can become intermingled and undecipherable.

This is currently a problem for macOS code signing as all of the files are concurrently signed by a threadpool.

Example log file: https://github.com/user-attachments/files/16044976/briefcase.2024_06_30-10_13_00.package.log

Describe the solution you'd like

Convert all of the subprocess command logging to be atomic such that each block of logging for a specific command is guaranteed to be on consecutive lines in the log.

Describe alternatives you've considered

Omit concurrent subprocess execution.

Additional context

No response

@rmartin16 rmartin16 added the enhancement New features, or improvements to existing features. label Jul 15, 2024
@freakboy3742
Copy link
Member

Agreed this is an issue; code signing logs are very difficult to parse. Not sure where to start looking at a solution - we essentially need to be able to push a logging context when we enter a thread, and only display the output of that context once the thread has been joined.

@rmartin16
Copy link
Member Author

I was thinking about something a little less invasive given our use-case....although, this solution is also not generalized.

So, right now, we print some initial information about the command, run the command while potentially printing its output to the console, and finish with some information about the outcome of the command.

In the situations, though, where the commands are being run concurrently, we probably aren't going to be depending on any output from the commands being printed in real-time (since that'd likely be a real mess); therefore, we could simply cache this logging until the command finishes and then flush the logging at the end for each command.

Such as implementation would probably leverage a flag for calls to Subprocess to activate such a style of logging; but the current style of logging would still be used non-concurrent commands. (I imagine the logging flush may also require some level of locking to avoid intermingled flushing.)

Thoughts?

@freakboy3742
Copy link
Member

In the situations, though, where the commands are being run concurrently, we probably aren't going to be depending on any output from the commands being printed in real-time (since that'd likely be a real mess); therefore, we could simply cache this logging until the command finishes and then flush the logging at the end for each command.

I'd got so far as to say we can't rely on output from concurrent commands being printed in real-time, and we should actively protect against that possibility.

Such as implementation would probably leverage a flag for calls to Subprocess to activate such a style of logging; but the current style of logging would still be used non-concurrent commands. (I imagine the logging flush may also require some level of locking to avoid intermingled flushing.)

Thoughts?

Makes sense - although I do wonder where the line is between "run with threadpool-compatible logging" and "check_output". Even check_output would need some changes to be useful in a threadpool situation - we effectively need "check_output, but return all the useful details, rather than logging them" so that the thread joiner can dump the log everything.

Looking at the implementation, I wonder if the easiest approach might be to make the "log wrapper object" an argument to check_output (and maybe other methods?). At present, the log wrapper is the subprocess tool class (i.e., self) so we can call self._log_output() etc); but it wouldn't be too hard to write a proxy object with the same interface that stores all the calls made on _log_*(), create one of these proxies for each thread, pass it in when invoking check_output threaded, and then replay what the proxy saw back onto subprocess when the thread is joined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New features, or improvements to existing features.
Projects
None yet
Development

No branches or pull requests

2 participants