Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Objective
Improve CPU-side rendering performance.
Solution
Create
BufferPool
variants ofDynamicBuffer
,StorageBuffer
,BatchedUniformBuffer
, andGpuArrayBuffer
. They do not have a system RAM side buffer of any kind, but rely onQueue::write_buffer_with
to directly write values into a staging buffer. AsQueue::write_buffer
with operates with a&Queue
, it's possible to parallelize down to a view level when batching.The downside is that these buffers are not resizable after being mapped, so these types must reserve fixed sized slices from the buffer ahead of time. The data flow runs as follows:
clear_batch_buffer
clears the buffer pool.reserve_batch_buffer
mutably grabs the buffer pool and reserves a range for everyRenderPhase<T>
and saves the reserved in theRenderPhase
. This is a very fast O(1) operation that does not require allocation or IO of any kind. These must run sequentially due to needing to grab the pool and render phases in parallel.allocate_batch_buffer
allocates the actual GPU-side buffer.batch_and_prepare_render_phase
then runs on each of them in parallel and parallelizes individual views withQuery::par_iter_mut
.NOTE: I'm likely getting something wrong with the indices, which is causing mesh draw calls to be mismatched, causing them to be rendered randomly, which can be potentially seizure inducing depending on the scene. Please be aware of this while testing this change.
Performance
Tested against
many_foxes
, this nets a rough 4% improvement in render schedule timings.TODO: Test this against heavier scenes.
Changelog
TODO
Migration Guide
TODO