Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.13] Wait for events to be processed instead of sleeping in getHtml #664

Open
wants to merge 2 commits into
base: 1.13
Choose a base branch
from

Conversation

GrahamCampbell
Copy link
Member

No description provided.

@divinity76
Copy link
Contributor

if it works, that's great 👍 did you test it with the reproduce-code from #516 ?

@GrahamCampbell
Copy link
Member Author

No, I have not yet tested it.

@GrahamCampbell
Copy link
Member Author

It would actually be super helpful if you could test this code against your example please.

@divinity76
Copy link
Contributor

divinity76 commented Nov 6, 2024

@GrahamCampbell
first check that the problem is not present in current 1.12 dev:

$ composer show
chrome-php/chrome         1.12.x-dev 3bbeb45 Instrument headless chrome/chr...
chrome-php/wrench         1.7.0              A simple PHP WebSocket impleme...
evenement/evenement       3.0.2              Événement is a very simple e...
monolog/monolog           3.7.0              Sends your logs to files, sock...
psr/log                   3.0.2              Common interface for logging l...
symfony/filesystem        7.1.6              Provides basic utilities for t...
symfony/polyfill-ctype    1.31.0             Symfony polyfill for ctype fun...
symfony/polyfill-mbstring 1.31.0             Symfony polyfill for the Mbstr...
symfony/polyfill-php80    1.31.0             Symfony polyfill backporting s...
symfony/process           7.1.7              Executes commands in sub-proce...


$ cat repro.php 
<?php

declare(strict_types=1);
require_once('vendor/autoload.php');
$chromeBinary = "/snap/bin/chromium";
$browser_factory = new \HeadlessChromium\BrowserFactory($chromeBinary);
$browser_factory->setOptions([
    "headless" => true,
    "noSandbox" => true,
    'windowSize'   => [1000, 1000]
]);
$browser = $browser_factory->createBrowser();
$page = $browser->createPage();
for ($i = 0; $i < 100; ++$i) {
    $page->navigate("http://example.com");
    $html = $page->getHtml();
    $page->navigate("http://example.org");
    $html = $page->getHtml();
}
$ time php repro.php 

real	0m28.622s
user	0m22.046s
sys	0m10.606s
$ git diff
diff --git a/vendor/chrome-php/chrome/src/Page.php b/vendor/chrome-php/chrome/src/Page.php
index 1192258..6a8e290 100644
--- a/vendor/chrome-php/chrome/src/Page.php
+++ b/vendor/chrome-php/chrome/src/Page.php
@@ -940,16 +940,7 @@ class Page
      */
     public function getHtml(?int $timeout = null): string
     {
-        try {
-            return $this->evaluate('document.documentElement.outerHTML')->getReturnValue($timeout);
-        } catch (JavascriptException $e) {
-            if (0 === \strpos($e->getMessage(), 'Error during javascript evaluation: TypeError: Cannot read properties of null (reading \'outerHTML\')')) {
-                \usleep(1000);
-
-                return $this->evaluate('document.documentElement.outerHTML')->getReturnValue($timeout);
-            }
-            throw $e;
-        }
+        return $this->evaluate('document.documentElement.outerHTML')->getReturnValue($timeout);
     }
$ time php repro.php 
PHP Fatal error:  Uncaught HeadlessChromium\Exception\JavascriptException: Error during javascript evaluation: TypeError: Cannot read properties of null (reading 'outerHTML')
    at <anonymous>:1:26 in /home/hans/projects/test/vendor/chrome-php/chrome/src/PageUtils/PageEvaluation.php:89
Stack trace:
#0 /home/hans/projects/test/vendor/chrome-php/chrome/src/PageUtils/PageEvaluation.php(108): HeadlessChromium\PageUtils\PageEvaluation->waitForResponse()
#1 /home/hans/projects/test/vendor/chrome-php/chrome/src/Page.php(943): HeadlessChromium\PageUtils\PageEvaluation->getReturnValue()78
#2 /home/hans/projects/test/repro.php(16): HeadlessChromium\Page->getHtml()
#3 {main}
  thrown in /home/hans/projects/test/vendor/chrome-php/chrome/src/PageUtils/PageEvaluation.php on line 89

real	0m7.078s
user	0m1.101s
sys	0m1.325s

Yes it does. Notably, it used 7 seconds to reproduce, meaning it took multiple attempts before it reproduced (we can guess based on 7.078/(28.622/100) = 24.7 that it took 24-25 attempts before it reproduced, but the exact number is not important, it's just imortant that it reproduced long before attempt 100)

Then check if the code from #664 fixes it:

git diff --color=always
diff --git a/vendor/chrome-php/chrome/src/Page.php b/vendor/chrome-php/chrome/src/Page.php
index 1192258..5d0409b 100644
--- a/vendor/chrome-php/chrome/src/Page.php
+++ b/vendor/chrome-php/chrome/src/Page.php
@@ -940,16 +940,8 @@ class Page
      */
     public function getHtml(?int $timeout = null): string
     {
-        try {
-            return $this->evaluate('document.documentElement.outerHTML')->getReturnValue($timeout);
-        } catch (JavascriptException $e) {
-            if (0 === \strpos($e->getMessage(), 'Error during javascript evaluation: TypeError: Cannot read properties of null (reading \'outerHTML\')')) {
-                \usleep(1000);
-
-                return $this->evaluate('document.documentElement.outerHTML')->getReturnValue($timeout);
-            }
-            throw $e;
-        }
+        $this->getSession()->getConnection()->processAllEvents();
+        return $this->evaluate('document.documentElement.outerHTML')->getReturnValue($timeout);
     }
 
     /**
$ time php repro.php 
PHP Fatal error:  Uncaught HeadlessChromium\Exception\JavascriptException: Error during javascript evaluation: TypeError: Cannot read properties of null (reading 'outerHTML')
    at <anonymous>:1:26 in /home/hans/projects/test/vendor/chrome-php/chrome/src/PageUtils/PageEvaluation.php:89
Stack trace:
#0 /home/hans/projects/test/vendor/chrome-php/chrome/src/PageUtils/PageEvaluation.php(108): HeadlessChromium\PageUtils\PageEvaluation->waitForResponse()
#1 /home/hans/projects/test/vendor/chrome-php/chrome/src/Page.php(944): HeadlessChromium\PageUtils\PageEvaluation->getReturnValue()
#2 /home/hans/projects/test/repro.php(16): HeadlessChromium\Page->getHtml()
#3 {main}
  thrown in /home/hans/projects/test/vendor/chrome-php/chrome/src/PageUtils/PageEvaluation.php on line 89

real	0m5.139s
user	0m1.533s
sys	0m0.919s

(and we can guess based on 5.139/(28.622/100)=17.95 that it reproduced on attempt 17-18 this time. )

sorry 😢 this PR currently re-introduce the issue fixed in #516 .

@GrahamCampbell
Copy link
Member Author

Thanks for checking this. I wonder why this doesn't work. 😿

@GrahamCampbell GrahamCampbell changed the base branch from 1.12 to 1.13 December 9, 2024 11:54
@GrahamCampbell GrahamCampbell changed the title [1.12] Wait for events to be processed instead of sleeping in getHtml [1.13] Wait for events to be processed instead of sleeping in getHtml Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants