-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine draw commands to improve rendering performance #2421
base: dev
Are you sure you want to change the base?
Conversation
…ing distance sorting through the detection of primary intersectors when geometry is intersecting and then sorting them in a fixed order
…iately instead of keeping them to avoid memory usage buffer caching would be a better solution but that's complicated and doesn't currently work correctly
…g sorting or building performance
also removed the warning message about unpartitionable geometry as it seems to not be a relevant problem
… not recalculated when the normal is quantized. also fixed aligned quads not receiving the more accurate center based on the average of the unique vertexes.
Testing on Discord has shown that these changes can improve performance by around 35%, highly variable depending on specific combinations of many system and scene-related factors. There don't seem to have been any regressions that are statistically significant. An even more radical optimization that attempts to organize sections such that then combining draw commands across sections is possible did not yield useful results, but I suspect the implementation has a bug. It can be found here (link), but isn't included in this PR. |
I'm marking it as ready for review/merging |
Because the changes from #2352 have been squashed into |
# Conflicts: # src/main/java/net/caffeinemc/mods/sodium/client/render/chunk/compile/ChunkBuildBuffers.java
I think it's good to go now |
…ing because it seems broken
…ely picking the size of the required shared index buffer
The graphical corruption should be fixed now. |
…ex buffer and instead share this type of data within regions
That bug is fixed now. |
I can't think of anything else to add here. I would appreciate review/merging as appropriate. Testing happened on discord and generally there's been a significant performance improvement on some systems and at least no regressions on the rest. The latest few commits have also resulted in significant VRAM savings in specific scenarios, and a moderate savings in normal scenes. |
What about 5% regression on Nvidia? |
On your particular system (i5-8300H, gtx 1050) there seems to be a slight regression from 500 fps at RD 8, but this effect wasn't observed on other systems with nvidia graphics cards. |
# Conflicts: # common/src/main/java/net/caffeinemc/mods/sodium/client/gl/arena/GlBufferArena.java # common/src/main/java/net/caffeinemc/mods/sodium/client/gl/arena/GlBufferSegment.java # common/src/main/java/net/caffeinemc/mods/sodium/client/render/chunk/DefaultChunkRenderer.java # common/src/main/java/net/caffeinemc/mods/sodium/client/render/chunk/data/SectionRenderDataStorage.java # common/src/main/java/net/caffeinemc/mods/sodium/client/render/chunk/data/SectionRenderDataUnsafe.java
… ugly hacks, rename a bunch of methods to be consistent and clearer
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I'm locking this thread since it's been continually driven off-topic by Pojav Launcher users, despite the fact that we continue to tell them that their broken graphics drivers are not supported. |
# Conflicts: # common/src/main/java/net/caffeinemc/mods/sodium/client/render/chunk/compile/tasks/ChunkBuilderMeshingTask.java # common/src/main/java/net/caffeinemc/mods/sodium/client/render/chunk/region/RenderRegionManager.java
This PR makes it so that draw commands that read from adjacent vertex data are combined. This reduces the number of draw commands by around 30% and improves fps on my system by up to 57% depending on the scene and circumstances. I'm on macOS with a 6900 XT. This performance improvement likely comes, as jellysquid stated on discord, from reduced CPU overhead in the driver and better GPU occupancy.
Please test if this results in a similar improvement or other effect, as it's probably dependent on graphics card, memory bandwidth, and platform (os/driver/vendor etc).
Here's a recording of the number of draw commands per pass:
ts on, before:
Draw total for pass Solid: 15531
Draw total for pass Cutout: 13277
Draw total for pass Translucent: 2298
ts on, after:
Draw total for pass Solid: 9571
Draw total for pass Cutout: 8306
Draw total for pass Translucent: 2298
ts off, before:
Draw total for pass Solid: 15531
Draw total for pass Cutout: 13277
Draw total for pass Translucent: 3812
ts off, after:
Draw total for pass Solid: 9571
Draw total for pass Cutout: 8306
Draw total for pass Translucent: 3645
Here's some screenshots without and with this patch. The fps numbers here are outdated, since this branch has been updated in the meantime. See the newest comments at the bottom of this thread instead.