Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qcommon: let the framerate flirt with the limits #1150

Merged
merged 1 commit into from
May 27, 2024

Conversation

illwieckz
Copy link
Member

@illwieckz illwieckz commented May 15, 2024

Let the framerate flirt with the limits.

Basically the idea is to not sleep the remaining time to fill the expected frametime, but to let enough time after sleeping to process events in a loop until the expected frametime is reached.

It makes the framerate less jaggy, and actually provides to the user what they expect.

@illwieckz
Copy link
Member Author

Some before/after screenshots…

1. common.framerate.max 100, 100 fps is expected.

Before, notice the 99 fps (it never reached 100 fps) and the jaggy framerate line:

unvanquished_2024-05-15_203847_000

After, notice the 100 fps and the smooth framerate line:

unvanquished_2024-05-15_204113_000

2. common.framerate.max 0, 333 fps is expected.

Before, notice the 325 fps (it never reached 333 fps) and the jaggy famerate line:

unvanquished_2024-05-15_203913_000

After, notice the 333 fps and the smooth framerate line:

unvanquished_2024-05-15_204153_000

2. common.framerate.max -1, 1000 fps is expected.

Before, notice the 932 fps (it never reached 1000 fps) and the jaggy framerate line:

unvanquished_2024-05-15_204002_000

After, notice the 1000 fps and the smooth framerate line:

unvanquished_2024-05-15_204308_000

@illwieckz
Copy link
Member Author

This was tested on high performance CPU and GPU profile with lowes preset and low resolution (expecting to get very high framerate), with framerate flirting with limits like 1000 fps.

This was also tested on low performance CPU and GPU profile with ultra preset and 4K resolution (expecting to get very low framerate and the game struggling to render), with framerae flirting with limits like 50 fps or 30 fps.

In all case the game gave me the framerate I asked.

In the rare cas I got more, I got a bit more, which is expected since when frametime = 1000 / framerate, then frametime * 1000 < framerate (precision loss in integer division), and which is wanted since if you ask the game a framerate you want it to match with your screen, you don't want to get your screen renders two times the same frame. You better want to waste a rendered frame than to render twice the same one.

Copy link
Contributor

@DolceTriade DolceTriade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the variable names and comments could be clarified to indicate intent and usage.

As it is written, I honestly have a lot of difficulty following the code.


/* Fill as much as input frames as possible within this timeframe to not
let the game process events in a way it would last longer than that. */
int jamTime = std::max( afterProcessing, 50 );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's jam time? I'm not sure the comment is clear to me.

Copy link
Member Author

@illwieckz illwieckz May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that if you need 50ms to reach the frametime, you should not sleep 50ms, because 1ms is not precise enough and if you stop sleeping 0.1ms after the frametime, you lost 1ms. And losing 1ms per frame makes the 1000fps cap becoming a 500fps one. So basically if you need 50ms to reach the frametime, you better sleep for 45ms then loop on input until the frametime is reached, that non-sleeping final looping is what I call the “jam time”.

It's like you sleep for a whole night but wake up 10 min before the alarm clock is expected to ring, keeping the alarm clock in hand and checking the time running second per second, and you don't miss any second, so when the alarm clock rings, no one can beat you at getting up faster than you, and you would wake up that fast even if you were deaf. What I called the jam part is that part when you wait for the clock to ring, while already being awake.

I will rewrite this part any way, actually the 50ms as a jam time means 20fps, so we currently never sleep. It's like, instead of waking up 10min before the clock ring, you keep yourself awake for the whole night. We only need to wake up 10 minutes before the alarm clock rings.

Copy link
Member Author

@illwieckz illwieckz May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 50ms came from the original code and that confused me as it looked like this would be a valid value, even if it was only an upper limit for a sleep (which is now a lower limit for a jam). This is totally stupid in both cases, unless you're in 1999 and you expect most players to have less than 20 fps.

let the game process events in a way it would last longer than that. */
int jamTime = std::max( afterProcessing, 50 );

int sleep = minMsec - msec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called sleep, but we don't actually sleep for this time. Maybe we should call it something else

@slipher
Copy link
Member

slipher commented May 16, 2024

Basically the idea is to not sleep the remaining time to fill the expected frametime, but to let enough time after sleeping to process events in a loop until the expected frametime is reached.

Are you saying you are going to put it in a busy loop?

In the rare cas I got more, I got a bit more, which is expected since when frametime = 1000 / framerate, then frametime * 1000 < framerate (precision loss in integer division), and which is wanted since if you ask the game a framerate you want it to match with your screen, you don't want to get your screen renders two times the same frame. You better want to waste a rendered frame than to render twice the same one.

Isn't that what vsync is for?

@illwieckz
Copy link
Member Author

illwieckz commented May 16, 2024

Are you saying you are going to put it in a busy loop?

In some way, yes. We have a problem here, this code was designed for a time when people were lucky to reach 30fps.

A sleep function based on milliseconds here is not precise enough.

With our 333 high cap, it means 3.003 ms of sleep between each frame if any compute was not consuming any time.

There are 240MHz screens out there, it means 4.166ms between each frames if any compute was not consuming any time.

Isn't that what vsync is for?

Yes? I'm not saying this should be used instead of vsync, I'm just using this example to show how random the current implementation is and can't give what the user asked for. It's always randomly sleeping and produces a unpredictable framerate.

I'm even wondering if the original code was making players with higher framerate advantaged against those with lower framerate. As with higher framerate the game processes the input more frequently, as with lower framerate the game just sleeps while the player inputs. With two people clicking +fire at the same time, the high fps player may probably win? One should have to dive deeper in the code to check that, but the surface really, really stinks.

@illwieckz
Copy link
Member Author

Well, about the busy loop, the “busy loop” calls Com_EventLoop() and IN_Frame() on every step, which is probably not really a “busy loop”.

@slipher
Copy link
Member

slipher commented May 16, 2024

Are you saying you are going to put it in a busy loop?

In some way, yes.

That's bad. For example if you are using a laptop, it can drain the battery quickly. Please don't make a busy loop!

I'm even wondering if the original code was making players with higher framerate advantaged against those with lower framerate. As with higher framerate the game processes the input more frequently, as with lower framerate the game just sleeps while the player inputs. With two people clicking +fire at the same time, the high fps player may probably win? One should have to dive deeper in the code to check that, but the surface really, really stinks.

Yes, the player with a higher framerate would be expected to be able to respond faster in general. Stuff will show up faster on the screen, packets will be sent out sooner, etc. Mostly we can't do anything about that.

There is one thing that checking the input more frequently could improve, which is the responsiveness of movement. The forwardmove/rightmove/upmove part of usercmd_t is supposedly pro-rated to the proportion of the frame that the key was held, with the assumption that it was pressed down at the same wall clock time that the input loop detected it. Still, there is no sense in checking more frequently than once per millisecond! I believe other controls won't be affected by more frequent input checking, as they are just booleans (or commands), and the usercmd_t packet will only be sent once per frame.

Using a more precise sleep function is not a bad idea, as long as it is implemented by actually sleeping rather than spinning the CPU.

@DolceTriade
Copy link
Contributor

That's bad. For example if you are using a laptop, it can drain the battery quickly. Please don't make a busy loop!

To be fair, this is a game which is going to have excessive power constraints anyways.

Using a more precise sleep function is not a bad idea, as long as it is implemented by actually sleeping rather than spinning the CPU.

Using a more precise sleep function is not a bad idea, as long as it is implemented by actually sleeping rather than spinning the CPU.

https://stackoverflow.com/questions/13397571/precise-thread-sleep-needed-max-1ms-error

Seems like this is actually quite difficult.

It would be interesting to note how long each "busy loop" takes on the average case.

@DolceTriade
Copy link
Contributor

Also I'm not sure if this is related to this change, but when testing this change, if I change the limit from 125 to 144, the game actually limits me to 167.

@DolceTriade
Copy link
Contributor

Also in my tests, we never actually sleep...

@illwieckz
Copy link
Member Author

https://stackoverflow.com/questions/13397571/precise-thread-sleep-needed-max-1ms-error

Very interesting thread! And a bit scary:

According to this article, scheduler latency could be anywhere between 10-30ms on Linux.

And there are already specific code in our code base for Windows because of the default sleep being bad.

For 60fps, we need a sleep function that can wait 16ms with 0.6ms error. For 144 fps, we need a sleep function that can wait 6ms with .9ms error, and for 240fps we need a sleep that can wait for 4ms with 0.1ms error. Ouch.

It probably doesn't make sense to start sleeping when requesting framerate above 60fps, and someone playing the game on laptop on battery doesn't want to have more than 60 fps. If the player requests more than 60fps, it will not be the useless CPU spinning that will draw the batter, but the GPU crunching insane amount of numbers.

@illwieckz
Copy link
Member Author

Also I'm not sure if this is related to this change, but when testing this change, if I change the limit from 125 to 144, the game actually limits me to 167.

The various name discrepancies you pointed out (like the sleep variable not used for sleeping) comes from the fact I tried many attempts of achiving this… And in 90% of my attempts (not with this one on my end), I was experiencing curious fps jumps.

Like, I could have 125 fps or 200 fps but nothing in between whatever the value I would ask… And I've set all the options I know to disable screen sync, and I got it with the same environment I use while not getting it with this branch.

@illwieckz
Copy link
Member Author

Also, testing this branch with a CPU governor being ondemand is like not testing this branch, the randomness of the CPU governor will be as wrong as the sleep randomness this branch attemps to avoid.

@illwieckz
Copy link
Member Author

Well, not as wrong… on master branch with ondemand CPU governor I get a close-to-flat line around 60fps, with 120 fps it stats being bad, after that it's just garbage.

With this branch with ondemand it's still smooth at 300fps, and really smooth. But I when approching the 1000fps cap, I can't get more than 900fps, the CPU governor is just breaking everything. It's not because the hardware can't do it, if I remove the 1000 fps cap it goes well higher.

@illwieckz
Copy link
Member Author

It probably depends on the environment (other software running around), I remember that yesterday I could witch from powersave to ondemand and just get flat line in both case, with an immediate switch to the different levels.

@DolceTriade
Copy link
Contributor

Can you verify if we sleep at all? In my tests we don't, regardless of the FPS limit.

@illwieckz
Copy link
Member Author

illwieckz commented May 16, 2024

I pushed a simpler implementation. It is purposed to only sleep if there is more than 2ms to sleep (under 500fps) and never sleep the last 1ms (if the sleep function is precise enough to wake up with less than 1ms error…).

@illwieckz
Copy link
Member Author

On my end it is still smooth at 333fps with ondemand governor and a crowded desktop.

@illwieckz
Copy link
Member Author

It is also smooth at 60fps on powersave CPU profile with low GPU profile.

@DolceTriade
Copy link
Contributor

Ok, the weirdness around maxFPS = 144 is around integer division.

we do:

minMsec = 1000 / maxFPS

1000 / 144 = 6.9 but we truncate it to 6. 1000 / 6 =~ 167.

The new code is way simpler and easier to follow. Let me verify it actually sleeps.

@DolceTriade
Copy link
Contributor

We actually sleep. Nice.

@illwieckz
Copy link
Member Author

1000 / 144 = 6.9 but we truncate it to 6. 1000 / 6 =~ 167.

Yes, the same way starting with 251 you jump from 250 to 333 directly because 1000/251 is 3.9.

We actually sleep. Nice.

Good! Then it works.

Copy link
Contributor

@DolceTriade DolceTriade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, at 125 fps, we have a budget of 8ms. If assuming Com_EventLoop() and IN_Frame() are fast (~1ms), we sleep for 5ms and busy loop for 2ms.

I think if we are worried about power saving, lots of new games have a "power saving mode" where we can modify the max 50ms sleep and make it larger.

@illwieckz
Copy link
Member Author

illwieckz commented May 16, 2024

Yes and the laptop user on battery with 60fps would have a budget of 16ms, sleeping for 14ms and being busy for 2ms, actually sleeping 840ms per second (if the rendering is instantaneous, of course)

@DolceTriade
Copy link
Contributor

Last thing to mention is that, this PR effectively doubles CPU usage when locked at 125fps for me, but does not affect CPU when maxed out.

On master, at 125fps, game uses ~15% of a core. With this PR, we use ~30% of a core.
On master at 333fps, game uses ~60% of a core. With this PR, we use ~60% of a core.

@illwieckz
Copy link
Member Author

I think if we are worried about power saving, lots of new games have a "power saving mode" where we can modify the max 50ms sleep and make it larger.

50ms sleep means 20fps (if the rest of the compute is instantaneous), I doubt we need to increase this limit.

@illwieckz
Copy link
Member Author

illwieckz commented May 16, 2024

On master, at 125fps, game uses ~15% of a core. With this PR, we use ~30% of a core.
On master at 333fps, game uses ~60% of a core. With this PR, we use ~60% of a core.

Interesting, what do you get at 60fps?

@DolceTriade
Copy link
Contributor

What I meant was instead of waking up every 50ms, we can sleep more at the cost of more variable FPS.

@illwieckz
Copy link
Member Author

Ah yes, purposed variable rate… Not updating the screen if nothing changes, for example.

@DolceTriade
Copy link
Contributor

Interesting, what do you get at 60fps?

On master at 60fps, game uses ~6% of a core.
With this PR at 60fps, game uses ~15% of a core.

@DolceTriade
Copy link
Contributor

DolceTriade commented May 16, 2024

The trade off here being CPU cycles for increased responsiveness and smoother FPS?

@illwieckz
Copy link
Member Author

illwieckz commented May 16, 2024

Can you look at the CPU usage with 60fps and 120fps with this line?

int sleep = std::max( std::min( minMsec - msec, 50 ) - 1, 0 );

@illwieckz illwieckz force-pushed the illwieckz/framerate branch 2 times, most recently from f797339 to dd700d1 Compare May 16, 2024 21:35
@illwieckz
Copy link
Member Author

illwieckz commented May 16, 2024

I added a commit with a small accommodation: for framerates up to 250fps sleep until 1ms is left, but for people looking for higher framerates, use 2ms of margin, meaning there would be no margin after 333 fps. I still get a smooth framerate that looks good enough around 60 and 120 fps, and would save on CPU usage more.

framerate (fps) margin (ms) sleep time (ms)
10 1 99 (50+49)
20 1 49
30 1 32
60 1 15
144 1 5
250 1 3
333 2 1
500¹ 2 0
1000² 0 0

¹ There is actually no option to request 500fps anyway.
² Requires common.framerate.max -1 as magic value to unlock above the 333fps cap up to the next 1000fps cap.

@slipher
Copy link
Member

slipher commented May 16, 2024

Without implementing a more precise sleep, you could get the average framerate to more precisely match the requested max fps (overcoming both sleep imprecision and 1000/x rounding) by using a more clever algorithm that considers more frame times than just the last one. Let's say a float frameDebt static variable which steadily increases at a rate of (0.001 * max fps) per millisecond. And we subtract 1 from it every time a frame is completed. The next frame can be scheduled for whenever frameDebt will be at least 0. We'd want to clamp this to a small range around 0 so that a period where the engine can't keep up with the max FPS for a while doesn't mess things up for a long time.

This won't get the squiggly lines on your debugging tool to be perfectly straight while running at 500 FPS, but that doesn't have much to do with user experience :)

@illwieckz
Copy link
Member Author

Right now this looks good enough, it is expected to not waste more than 1ms of busy loop per frame below 333fps, so basically every player's use case, from low-end 30fps to 144fps and things in between.

@illwieckz illwieckz merged commit e8779b8 into master May 27, 2024
9 checks passed
@illwieckz illwieckz deleted the illwieckz/framerate branch May 27, 2024 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants