Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML2PDF: System chromedriver for PDF export #1932

Merged

Conversation

haxtibal
Copy link
Contributor

@haxtibal haxtibal commented Aug 4, 2024

PDF export requires chrome/chromedriver. Currently StrictDoc always uses webdriver_manager to download a suitable chromedriver and install in a strictdoc cache subdirectory.

There may be reasons to prefer a system installation over an adhoc download (e.g. security policy). Notably Debian provides packages that work out-of-the-box for StrictDoc. GitHub Ubuntu CI images have the upstream version pre installed.

This adds an CLI option --chromedriver to select an explicit chromedriver. If not given, strictdoc uses webdriver_manager as usual.

To use the Debian package, one would call

apt install chromium chromium-driver
strictdoc export --fromats=html2pdf --chromedriver=/usr/bin/chromedriver .

To use chromedriver from GitHub Ubuntu image, one would call

 strictdoc export --fromats=html2pdf --chromedriver=$CHROMEWEBDRIVER .

@haxtibal haxtibal force-pushed the tdmg/system_chromedriver branch 3 times, most recently from 9d1e763 to db27dcd Compare August 4, 2024 16:29
@haxtibal haxtibal changed the title RFC: HTML2PDF: System chromedriver for PDF export WIP: HTML2PDF: System chromedriver for PDF export Aug 4, 2024
@haxtibal haxtibal force-pushed the tdmg/system_chromedriver branch 2 times, most recently from 3bd83ff to 5a72cee Compare August 4, 2024 18:25
@haxtibal
Copy link
Contributor Author

haxtibal commented Aug 4, 2024

@stanislaw Sorry if these CI debug trials cause E-Mail spam at your side. I try to figure out why $CHROMEWEBDRIVER (defined by GitHub CI image) is not passed into lit in the tox py37 env, while it works in py312 env.

EDIT: I think it's solved now, thanks to a former stackoverflow answer of yours. I thought this snippet from LLVM guide

the syntax of the RUN lines is similar to a shell’s syntax for pipelines including I/O redirection and variable substitution

meant one could use $MY_VAR in RUN lines, but seemingly not. Still want to investigate why py312 reported OK. Most likely --chromedriver= is interpreted as "" in py37 and as None in py312.

@haxtibal haxtibal force-pushed the tdmg/system_chromedriver branch 3 times, most recently from 1ed0ac8 to 0716f2c Compare August 5, 2024 07:26
@haxtibal
Copy link
Contributor Author

haxtibal commented Aug 5, 2024

e2e test edit_grammar_add_field_move_up_and_down_save fails on macOS, but works on Linux and Windows. It should be unrelated to this PR, but can't prove it.

@stanislaw
Copy link
Collaborator

e2e test edit_grammar_add_field_move_up_and_down_save fails on macOS, but works on Linux and Windows. It should be unrelated to this PR, but can't prove it.

Yeah, that seems to be a rather rare flaky test. I recorded it in the code climate ticket. For now, just restart.

@stanislaw Sorry if these CI debug trials cause E-Mail spam at your side. I try to figure out why $CHROMEWEBDRIVER (defined by GitHub CI image) is not passed into lit in the tox py37 env, while it works in py312 env.

No worries at all! I am happy to see all the updates, including CI. I am glad that you figured it out for LIT.

Overall, your approach looks good but I will see if I have any specific comments.

@stanislaw
Copy link
Collaborator

@haxtibal the tests like this one edit_grammar_move_field_down are known issues but they used to fail very rarely. I did one round of checking why the could fail and didn't find an obvious reason back then.

I often get an impression that SeleniumBase/Selenium receive updates to their logic and this makes some of these flaky tests to re-surface and start failing more often. Right now it looks like these errors have nothing to do with your branch. I recorded all of these in the code climate and should investigate them, otherwise we will get blocked all the time in the future.

@stanislaw
Copy link
Collaborator

stanislaw commented Aug 6, 2024

One reason that used to plague a lot of the end2end tests used to be a generic HTTP 500 Web Driver Exception. A solution to this error in that case was to implement a retry mechanism, catching this 500 error. After that, you can very often see in the build logs that this machine actually works all the time, catching these 500 errors.

I think this may be the only strategy that actually works: catch and retry, two times.

@stanislaw
Copy link
Collaborator

This is the one I was talking about. It is right there in the failing job:

selenium_connection_error_try_handler: Exception <class 'selenium.common.exceptions.WebDriverException'> thrown when attempting to run <function E2ECase.open at 0x10f3210d0>, attempt 1 of 3.
selenium_connection_error_try_handler: This is a ERR_CONNECTION_REFUSED exception. Trying again...

@haxtibal
Copy link
Contributor Author

haxtibal commented Aug 6, 2024

I think this may be the only strategy that actually works: catch and retry, two times.

SeleniumBase has retry logic built in and we see it in action. The failing exception happens in this code path, which is already the retry of a retry, IIUC.

Any ideas how to reproduce this for debugging, or get good debug information? Ideally we could attach into a running gh runner session, but not sure if that's possible at all. Or maybe making a browser screenshot right before the failing test statement, or dumping the DOM state...

@stanislaw
Copy link
Collaborator

I think this may be the only strategy that actually works: catch and retry, two times.

SeleniumBase has retry logic built in and we see it in action. The failing exception happens in this code path, which is already the retry of a retry, IIUC.

Any ideas how to reproduce this for debugging, or get good debug information? Ideally we could attach into a running gh runner session, but not sure if that's possible at all. Or maybe making a browser screenshot right before the failing test statement, or dumping the DOM state...

I cannot check this right now. What I normally do is to explore the context around the failing line very carefully and try to add intermediate checks or add the retry logic. No silver bullets in my pocket.

Looking at the number of the tests randomly failing, I assume SeleniumBase/Selenium received an update that changed logic/timing and this revealed the existing weaknesses in the tests that are currently failing.

@haxtibal
Copy link
Contributor Author

haxtibal commented Aug 6, 2024

I cannot check this right now. What I normally do is to explore the context around the failing line very carefully and try to add intermediate checks or add the retry logic. No silver bullets in my pocket.

I'll revert the last debug commit to set the PR in a reviewable/functional state.

And then will do experiments regarding the failing test in a separate branch or personal repository. Debug your GitHub Actions by using ssh sounds interesting.

@haxtibal haxtibal changed the title WIP: HTML2PDF: System chromedriver for PDF export HTML2PDF: System chromedriver for PDF export Aug 6, 2024
PDF export requires chrome/chromedriver. Currently StrictDoc always uses
webdriver_manager to download a suitable chromedriver and install in a
strictdoc cache subdirectory.

There may be reasons to prefer a system installation over an adhoc
download (e.g. security policy). Notably Debian provides packages that
work out-of-the-box for StrictDoc. GitHub Ubuntu CI images have the
upstream version pre installed.

This adds an CLI option --chromedriver to select an explicit
chromedriver. If not given, strictdoc uses webdriver_manager as usual.

To use the Debian package, one would call

 apt install chromium chromium-driver
 strictdoc export --fromats=html2pdf --chromedriver=/usr/bin/chromedriver .

To use chromedriver from GitHub Ubuntu image, one would call

 strictdoc export --fromats=html2pdf --chromedriver=$CHROMEWEBDRIVER .
@stanislaw stanislaw merged commit 1b8f3cf into strictdoc-project:main Aug 7, 2024
23 checks passed
@stanislaw stanislaw added this to the 2024-Q3 milestone Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants