Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.9] Fix rare crash in getHtml #516

Merged
merged 6 commits into from
May 4, 2023
Merged

Conversation

divinity76
Copy link
Contributor

for a split second, documentElement might be missing, causing getHtml() to crash. I had a program that was doing page stuff and calling getHtml() like every 10 milliseconds (100 times per second), and got an unexpected crash. Was able to create a small reproducible sample:

<?php

declare(strict_types=1);
require_once('vendor/autoload.php');
$chromeBinary = "/snap/bin/chromium";
$browser_factory = new \HeadlessChromium\BrowserFactory($chromeBinary);
$browser_factory->setOptions([
    "headless" => true,
    "noSandbox" => true,
    'windowSize'   => [1000, 1000]
]);
$browser = $browser_factory->createBrowser();
$page = $browser->createPage();
for ($i = 0; $i < 100; ++$i) {
    $page->navigate("http://example.com");
    $html = $page->getHtml();
    $page->navigate("http://example.org");
    $html = $page->getHtml();
}

consistently crash with:

PHP Fatal error:  Uncaught HeadlessChromium\Exception\JavascriptException: Error during javascript evaluation: TypeError: Cannot read properties of null (reading 'outerHTML')
    at <anonymous>:1:26 in /home/hans/projects/ibkr/vendor/chrome-php/chrome/src/PageUtils/PageEvaluation.php:89
Stack trace:
#0 /home/hans/projects/ibkr/vendor/chrome-php/chrome/src/PageUtils/PageEvaluation.php(108): HeadlessChromium\PageUtils\PageEvaluation->waitForResponse()
#1 /home/hans/projects/ibkr/vendor/chrome-php/chrome/src/Page.php(894): HeadlessChromium\PageUtils\PageEvaluation->getReturnValue()
#2 /home/hans/projects/ibkr/test_crash.php(16): HeadlessChromium\Page->getHtml()
#3 {main}
  thrown in /home/hans/projects/ibkr/vendor/chrome-php/chrome/src/PageUtils/PageEvaluation.php on line 89

divinity76 added 4 commits May 3, 2023 14:26
for a split second, documentElement might be missing, causing getHtml() to crash.
I had a program that was doing page stuff and calling getHtml() like every 10 milliseconds (100 times per second), and got an unexpected crash. Was able to create a small reproducible sample:
```php
<?php

declare(strict_types=1);
require_once('vendor/autoload.php');
$chromeBinary = "/snap/bin/chromium";
$browser_factory = new \HeadlessChromium\BrowserFactory($chromeBinary);
$browser_factory->setOptions([
    "headless" => true,
    "noSandbox" => true,
    'windowSize'   => [1000, 1000]
]);
$browser = $browser_factory->createBrowser();
$page = $browser->createPage();
for ($i = 0; $i < 100; ++$i) {
    $page->navigate("http://example.com");
    $html = $page->getHtml();
    $page->navigate("http://example.org");
    $html = $page->getHtml();
}
```
consistently crash with:
```
PHP Fatal error:  Uncaught HeadlessChromium\Exception\JavascriptException: Error during javascript evaluation: TypeError: Cannot read properties of null (reading 'outerHTML')
    at <anonymous>:1:26 in /home/hans/projects/ibkr/vendor/chrome-php/chrome/src/PageUtils/PageEvaluation.php:89
Stack trace:
#0 /home/hans/projects/ibkr/vendor/chrome-php/chrome/src/PageUtils/PageEvaluation.php(108): HeadlessChromium\PageUtils\PageEvaluation->waitForResponse()
chrome-php#1 /home/hans/projects/ibkr/vendor/chrome-php/chrome/src/Page.php(894): HeadlessChromium\PageUtils\PageEvaluation->getReturnValue()
chrome-php#2 /home/hans/projects/ibkr/test_crash.php(16): HeadlessChromium\Page->getHtml()
chrome-php#3 {main}
  thrown in /home/hans/projects/ibkr/vendor/chrome-php/chrome/src/PageUtils/PageEvaluation.php on line 89
```
@enricodias
Copy link
Member

Nice catch, but infinite loops makes me uneasy. If outerHtml remains null, it will indeed loop forever. We could use Utils::tryWithTimeout() to try with a timeout that doesn't get reset on each iteration.

We should also add JavascriptException (and OperationTimedOut if we use the Utils) to the @thrown section in the docblock.

The comments inside the method are unnecessary since the code itself explains what is being done and the PR linked to that commit explains why.

@enricodias
Copy link
Member

Actually I think we don't even need to throw JavascriptException up, we can just use the timeout and communication exceptions.

@divinity76
Copy link
Contributor Author

divinity76 commented May 3, 2023

Nice catch

thanks!

infinite loops makes me uneasy

it only tries 2 times, which is all it needs :) (there's no infinite loop in the PR)

If outerHtml remains null

i've only observed it being null for less than 1 millisecond, and it's not outerHTML that is null, it is document.documentElement that is null ^^ (for less than 1 millisecond)

We should also add JavascriptException (and OperationTimedOut if we use the Utils) to the @Thrown section in the docblock.

if this PR gets merged, i don't think it will throw JavascriptException, it's not supposed to anyway, that last

    throw $e;

is supposed to be unreachable (but i want to keep it anyway just in case I'm wrong about it being unreachable)

@enricodias
Copy link
Member

it only tries 2 times, which is all it needs :) (there's no infinite loop in the PR)

My mistake, I was reviewing it quickly on my phone and thought the method was calling itself in the catch, but it was just trying to evaluate twice.

The throw $e is unreachable, but technically it's possible that the second evaluate return another JavascriptException exception since it's doing it before the usleep, we just can't reproduce it currently. That's actually why I assumed the method was calling itself until it gets a value.

@divinity76
Copy link
Contributor Author

ok, so want me to add * @throws JavascriptException just in case?

@enricodias
Copy link
Member

Yeap, just in case.

possibly throws JavascriptException
@enricodias enricodias changed the title fix rare crash in getHtml [1.9] Fix rare crash in getHtml May 4, 2023
@enricodias enricodias merged commit ab591af into chrome-php:1.9 May 4, 2023
@divinity76
Copy link
Contributor Author

divinity76 commented May 4, 2023

it's a chore, but it should probably be cherry-picked to every maintained branch, most notably the 1.10 branch
(idk which branches are still maintained)

  • this is possible to do by creating a pull request to every maintained branch, but i think it's easier and cleaner for a maintainer with commit access to do it. i think it would go something like
git checkout 9.8;
git cherry-pick ab591afbfd973e1950a07c166d2f60f7e13033d7;
git checkout 9.10;
git cherry-pick ab591afbfd973e1950a07c166d2f60f7e13033d7;
git push;

but I'm not a git expert.

@enricodias
Copy link
Member

enricodias commented May 4, 2023

We only maintain the current version. Everyone using v1.8 can update to 1.9 without breaking anything.

@GrahamCampbell
Copy link
Member

but it should probably be cherry-picked to every maintained branch

That's not how we manage branches. 1.9 and 1.10 are not totally separate codebases. We merge upwards, periodically.

@GrahamCampbell
Copy link
Member

I'm looking into rolling back this change, and replacing it with a proper fix: #664.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants