Performance improvement ideas #1042

AlliBalliBaba · 2024-09-22T22:12:19Z

AlliBalliBaba
Sep 22, 2024
Collaborator

nickchomey · 2024-09-23T05:04:51Z

nickchomey
Sep 23, 2024

Improvements are always good, but isn't 90,000 rps far beyond anything anyone is realistically going to need? The bottleneck very much doesn't seem like the hand-off from caddy to php at that point (which is what a hello world is effectively measuring, no?)

So, I wonder if there should be a focus on setting up some standardized benchmarking suite of real world php workloads - laravel (normal and octane), some sort of fphp workers web app, WordPress without workers etc...? And compare it to php-fpm with caddy and nginx, openlitespeed etc

Whatever tests like that reveal - be it that fphp is faster or has some genuine bottlenecks - would be very useful to know

5 replies

AlliBalliBaba Sep 23, 2024
Collaborator Author

You are right, a 'Hello World' benchmark is as synthetic as it gets. The point of these benchmarks is not to showcase the potential RPS frankenphp can do, it's about identifying potential bottlenecks in the runtime itself. Removing symlink checking on every request might matter very little or a lot to your actual application, depending on what your bottlenecks are.

That's why you should generally only trust benchmarks conducted on your actual productive system (that will probably run on less than 20 CPU cores).

It's worth noting that these optimization also aim at improving the techempower benchmarks, which are slightly less synthetic.

AlliBalliBaba Sep 23, 2024
Collaborator Author

I'll try to create a flamegraph with a Laravel application though, that sound like a good idea 👍

nickchomey Sep 23, 2024

Yes, I'm not at all against improving the runtime/guts of caddy/frankenphp, but I'm instead advocating for taking a step back and assessing what is worth putting limited time and resources towards.

90,000 rps is 0.01 milliseconds per request, or 10 microseconds. That's completely imperceptible in the context of any real application or even just basic api endpoint that is surely going to be many orders of magnitude slower. And then there's the unavoidable network latency that'll add another 2-4 orders of magnitude of latency... The techempower benchmarks are completely irrelevant in this context - perhaps they're worthwhile if you're doing high frequency trading and every microsecond counts.

I think benchmarks on real-world use-cases - eg. laravel - would be infinitely more useful. Given that the webserver is basically instantaneous, it would be FAR more revelatory to evaluate differences between frankenphp and php-fpm with caddy or nginx, or with lsphp and openlitespeed. I'd honestly expect them all to be pretty much the same - if they aren't then THAT gives reason for investigation into why php is running slower or with more ram usage than other implementations.

Also, given that a real application uses real computing resources, the results would be far more informative for evaluating things like number of workers per core, than something completely synthetic and irrelevant.

I realize that I'm just yelling from the cheap seats, but I very much intend to actually help contribute to this sort of stuff at some point when time permits me. Until then, I'm just hoping to help those who are currently active with development and testing to make best use of their limited time.

Other important things that could be developed are better docs, tooling, docker images, addons, etc...

dunglas Sep 23, 2024
Maintainer

For the record, there is a more "real-life" benchmark in the Symfony demo app, and the Laravel team created one for Laravel too:

withinboredom Sep 25, 2024
Collaborator

but I'm instead advocating for taking a step back and assessing what is worth putting limited time and resources towards.

That's the thing. We are using "regular PHP" once it gets past a certain point. Those types of benchmarks apply regardless of what server you are using. While yes, some of us are working on improvements in PHP itself (which everyone benefits from), we can only control everything up until the point it gets into "regular PHP" and when it returns. These benchmarks specifically focus on exactly that. We aren't interested in "regular PHP" performance, but rather FrankenPHP/Caddy itself, which surrounds "regular PHP."

I understand your concerns, but there is very little we can realistically do about applications and what they do in "regular PHP-land." I hope that makes sense.

dunglas · 2024-09-23T19:20:41Z

dunglas
Sep 23, 2024
Maintainer

The document root is also cached in worker mode (#1002 added cache for both modes).

4 replies

AlliBalliBaba Sep 24, 2024
Collaborator Author

Hmm it somehow was still showing up in the flamegraph, I'll check again

AlliBalliBaba Sep 24, 2024
Collaborator Author

Oh yeah you are right, but the root directory needs to be added explicitly in order to get cached
Here the root is resolved beforehand:

route {
	root /go/src/app/testdata
        php {
                root /go/src/app/testdata
        }
}

Here it is not (since it's {http.vars.root} when provisioning)

route {
	root /go/src/app/testdata
        php
}

dunglas Sep 24, 2024
Maintainer

Good catch! Maybe should we add an exception for this placeholder as it's the default and we know (I think) its value cannot change?

AlliBalliBaba Sep 25, 2024
Collaborator Author

I think the value won't be resolved at the time of provisioning since it comes from outside of the module. I guess it would be possible to store the resolved document roots in a map once we're already in a request

AlliBalliBaba · 2024-09-24T15:27:26Z

AlliBalliBaba
Sep 24, 2024
Collaborator Author

I think I realized why 10 workers is the magic number where the 'Hello World' is most efficient. It's basically about CPU context switching and to some degree unavoidable.
With a 'Hello World' roughly 50% of work happens on the Caddy side and the other 50% in the FrankenPHP threads (see Flamegraphs). Therefore we have the least context switches if Caddy and Frankenphp can exactly utilize half of the CPU cores each (10 for Caddy, 10 for PHP Threads)

I think the only way to make this more efficient would be to autoscale the number of PHP Threads based on the load instead of having a fixed number. While this would be a cool feature, it's probably not trivial to implement. I think FPM also has a similar mode.

7 replies

dunglas Sep 25, 2024
Maintainer

Starting threads only when needed may not be that hard to implement. The hard part will actually be to detect the when. Maybe #966 can help with that?

Another approach that I would like to try is to not use C threads at all, but run PHP directly in goroutines.

To do so we need to:

get rid of the use of signals in PHP (they are used for PHP timeouts), @arnaud-lb made a POC using kevent instead of signals and it should even be possible to directly use Go's time module if we add the proper abstraction for that in PHP
write a TSRM implementation compatible with goroutines

It's some work, for sure, but this doesn't look unfeasible at all.

withinboredom Sep 25, 2024
Collaborator

I still think #841 would work without those changes. I'm still reasonably confident that taking over a go thread is perfectly fine. :D

Working with go through TSRM is a whole different beast. Go doesn't have a concept of thread-local storage, and TSRM is all about thread-local storage. That isn't to say it is impossible, but to get it working will be an engineering challenge, for sure.

dunglas Sep 25, 2024
Maintainer

Because how TSRM is implemented (a global hash map with the "thread" ID as key), it's not that complicated. We "just" need to assign a different ID per PHP "thread".

withinboredom Sep 26, 2024
Collaborator

That's what I mean though. There is no thread id on the go side. There is nothing to match it to. As an aside, we also will need to remove any real threadlocal variables and the same goes for extensions.

withinboredom Sep 26, 2024
Collaborator

🤦 You don't need a thread id on the go side. It's more like a "request id" that is bound to an executing thread pool. I think you can do this today, actually (from digging into tsrm) with very little changes. The bigger issue is then threadlocal variables.

I will give this a try next week.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement ideas #1042

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 16 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Performance improvement ideas #1042

AlliBalliBaba Sep 22, 2024 Collaborator

Replies: 3 comments · 16 replies

nickchomey Sep 23, 2024

AlliBalliBaba Sep 23, 2024 Collaborator Author

AlliBalliBaba Sep 23, 2024 Collaborator Author

nickchomey Sep 23, 2024

dunglas Sep 23, 2024 Maintainer

withinboredom Sep 25, 2024 Collaborator

dunglas Sep 23, 2024 Maintainer

AlliBalliBaba Sep 24, 2024 Collaborator Author

AlliBalliBaba Sep 24, 2024 Collaborator Author

dunglas Sep 24, 2024 Maintainer

AlliBalliBaba Sep 25, 2024 Collaborator Author

AlliBalliBaba Sep 24, 2024 Collaborator Author

dunglas Sep 25, 2024 Maintainer

withinboredom Sep 25, 2024 Collaborator

dunglas Sep 25, 2024 Maintainer

withinboredom Sep 26, 2024 Collaborator

withinboredom Sep 26, 2024 Collaborator

AlliBalliBaba
Sep 22, 2024
Collaborator

Replies: 3 comments 16 replies

nickchomey
Sep 23, 2024

AlliBalliBaba Sep 23, 2024
Collaborator Author

AlliBalliBaba Sep 23, 2024
Collaborator Author

dunglas Sep 23, 2024
Maintainer

withinboredom Sep 25, 2024
Collaborator

dunglas
Sep 23, 2024
Maintainer

AlliBalliBaba Sep 24, 2024
Collaborator Author

AlliBalliBaba Sep 24, 2024
Collaborator Author

dunglas Sep 24, 2024
Maintainer

AlliBalliBaba Sep 25, 2024
Collaborator Author

AlliBalliBaba
Sep 24, 2024
Collaborator Author

dunglas Sep 25, 2024
Maintainer

withinboredom Sep 25, 2024
Collaborator

dunglas Sep 25, 2024
Maintainer

withinboredom Sep 26, 2024
Collaborator

withinboredom Sep 26, 2024
Collaborator