Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust max_render_time when direct scanout is active. #2377

Open
vanfanel opened this issue Jun 12, 2024 · 8 comments
Open

Adjust max_render_time when direct scanout is active. #2377

vanfanel opened this issue Jun 12, 2024 · 8 comments
Milestone

Comments

@vanfanel
Copy link

vanfanel commented Jun 12, 2024

Describe the bug
In RetroArch (latest stable and also GIT version), using the Vulkan backend in fullscreen with Settings->Video->Synchronization->Max Swapchain Images set to 2, the framerate is halved, so in a 60Hz display everything will run at 30FPS.
It can be observed without loading any cores, in Settings->Video->Output->Estimated Screen Refresh Rate.

Setting Settings->Video->Synchronization->Max Swapchain Images to 3 works around the problem, but introduces a frame of lag.

The problem is only happening in fullscreen mode. In windowed mode (even if the window is scaled to fullscreen dimensions using the mouse), it does not happen.
That could be indicating some problem with direct scanout, but exporting WLR_SCENE_DISABLE_DIRECT_SCANOUT=1 before running Wayfire does not make any difference.

All other sensible video/audio settings have been tried in RetroArch to try and isolate the cause.

Seems to be an specific compositor bug and not an WLRoots bug, because it doesn't happen in Labwc (https://github.com/labwc/labwc), but it also happens on Sway which is a bit puzzling.

To Reproduce
-Run RetroArch
-Go to Settings->Video->Synchronization->Max Swapchain Images and set it to 2
-Settings->Video->Output->Estimated Screen Refresh Rate to numerically observe the halved vsync rate (can also be observed empirically by simply navigating the menus).

Expected behavior
Setting Settings->Video->Synchronization->Max Swapchain Images to 2 should not halve the vsync rate.

Wayfire version
Latest stable (0.8.1) and latest GIT code both show this bug.
Using latest stable WLRoots (0.17.3) and latest stable MESA (24.1.1) with Intel N100 GPU and I5-1235U with XE graphics.

@vanfanel vanfanel added the bug label Jun 12, 2024
@ammen99
Copy link
Member

ammen99 commented Jun 12, 2024

Wayfire does not use wlr_scene which is why the env var you mentioned does not help in Wayfire. Try using https://github.com/WayfireWM/wayfire/blob/master/src/output/render-manager.cpp#L997

Also direct scanout is a bit special so unfortunately I am not surprised you have problems with it. Also it is possible that labwc (at least the version you use) doesn't use direct scanout for some reason, which would still make it a wlroots/general issue :(

@vanfanel
Copy link
Author

@ammen99 You are right. I exported WAYFIRE_DISABLE_DIRECT_SCANOUT=1 before running Wayfire, and vsync rate halving in RetroArch with Vulkan and max_swapchain=2 went away.

So this is definitely an WLRoots problem, since it happens in every compositor apparently (except labwc where direct scanout seems to be indeed disabled...)

@ammen99
Copy link
Member

ammen99 commented Jun 15, 2024

I think it is actually kind-of expected to happen, consider the following. Say the app uses 2 buffers, A and B. Now what happens:

  • Compositor commits A as the back buffer.
  • On vblank, A becomes front buffer. Compositor may now submit B as the back buffer.
  • Client cannot draw anything until next vblank ...
  • On next vblank, compositor does not have a new back buffer (the app did not have free buffers to render to, both were busy in the backend). B becomes the front buffer, A is unlocked and client can draw again.
  • Next vblank, we did not have a new back buffer, so B remains as the front buffer. Finally A arrives and can be scheduled for the next vblank.

Of course I might be wrong but that's how I imagine the sequence of events happening. It would corresponding exactly to halving of the refresh rate.

I think that the only way to work around this is by setting core.max_render_time in Wayfire to a relatively low value like 3 or 4. Let me know if that helps at all.

@vanfanel
Copy link
Author

vanfanel commented Jun 15, 2024

@ammen99 Setting a max_render_time value of 3-4 does indeed workaround the issue, and vsync rate is correctly detected in RetroArch, so your theory about the lack of buffers is confirmed, right?

But then, why isn't the same problem happening when RetroArch runs on KMS/DRM with the same 2 buffers scheme?
It only happens when it runs on Wayland compositors...

My understanding is that in KMS/DRM there's no "composition" and that makes the difference, but then again, on Wayland using Direct Scanout, there shouldn't be any composition being made either...

@ammen99
Copy link
Member

ammen99 commented Jun 15, 2024

@vanfanel That is not quite true, the compositor still has some influence over frame scheduliing. For example in this particular scenario:

  • A becomes back buffer
  • Next vblank => B back buffer, A front buffer
  • Next vblank => no new buffers, A becomes free, B is front buffer
    • Subcase no compositor: In the next 16ms, client renders to A and submits it as a back buffer. On next vblank, A is shown as the front buffer
    • Subcase compositor default settings: We skip this vblank altogether, there have been no changes from the client at all.
    • Subcase compositor with max_render_time=4: we signal to the clients that it is time to redraw. In the next 12ms, client hopefully submits a new buffer, and we show it on the next vblank. If client needs more than 12ms to redraw, we don't submit anything.

If a compositor wants this to work smoothly, they can also directly submit the new frame if we have direct scanout. But this is basically the same as setting max_render_time to very low (here render time is about the time wayfire needs to render, with direct scanout it is basically 0ms). So we need a dynamic repaint delay algorithm which adjusts this time - we have something like this, but it doesn't work very well. In the meantime, you can use a static value ..

@ammen99 ammen99 changed the title RetroArch vsync rate is halved when using the Vulkan backend and Max Swapchain Images is set to 2 Adjust max_render_time when direct scanout is active. Jun 15, 2024
@ammen99 ammen99 added enhancement and removed bug labels Jun 15, 2024
@ammen99 ammen99 added this to the pre-1.0 milestone Jun 15, 2024
@vanfanel
Copy link
Author

@ammen99 I see. I'm totally ok with the static value. In fact, I prefer it to internal adjust, so please keep the static values as an option even if you finally plug-in the dynamic adjustment. I think it's important to have things well under control to make input lag tests where it's important, etc.

Also, following all this buffer-juggling reasoning, in order to achieve low latency in RetroArch with Vulkan, what do you think is better?

A. 3 buffers swapchain with adjusted max_render_time to the lowest possible value to compensate (that doesn't cause framerate hiccups)

B. 2 buffers swapchain with adjusted max_render_time to the lowest possible value to compensate (that doesn't cause framerate hiccups, possibly not so low as with 3 buffers swapchain)

@ammen99
Copy link
Member

ammen99 commented Jun 15, 2024

For low latency, you want 2 buffers + low max_render_time, that is certain ;) Of course, you also need your hardware and general system performance to be sufficient, because there are cases where too low max_render_time causes lower refresh rate too (for example if you have expensive effects in Wayfire like blur+expo, you should not set max_render_time to too low).

@vanfanel
Copy link
Author

Also, regarding automatic max_render_time adjustment when direct scanout is active, it's not such a good idea, and I have to agree, so maybe it should be disabled by default:

labwc/labwc#1913 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants