Skip to content

Commit

Permalink
Generate MeshUniforms on the GPU via compute shader where available. (
Browse files Browse the repository at this point in the history
#12773)

Currently, `MeshUniform`s are rather large: 160 bytes. They're also
somewhat expensive to compute, because they involve taking the inverse
of a 3x4 matrix. Finally, if a mesh is present in multiple views, that
mesh will have a separate `MeshUniform` for each and every view, which
is wasteful.

This commit fixes these issues by introducing the concept of a *mesh
input uniform* and adding a *mesh uniform building* compute shader pass.
The `MeshInputUniform` is simply the minimum amount of data needed for
the GPU to compute the full `MeshUniform`. Most of this data is just the
transform and is therefore only 64 bytes. `MeshInputUniform`s are
computed during the *extraction* phase, much like skins are today, in
order to avoid needlessly copying transforms around on CPU. (In fact,
the render app has been changed to only store the translation of each
mesh; it no longer cares about any other part of the transform, which is
stored only on the GPU and the main world.) Before rendering, the
`build_mesh_uniforms` pass runs to expand the `MeshInputUniform`s to the
full `MeshUniform`.

The mesh uniform building pass does the following, all on GPU:

1. Copy the appropriate fields of the `MeshInputUniform` to the
`MeshUniform` slot. If a single mesh is present in multiple views, this
effectively duplicates it into each view.

2. Compute the inverse transpose of the model transform, used for
transforming normals.

3. If applicable, copy the mesh's transform from the previous frame for
TAA. To support this, we double-buffer the `MeshInputUniform`s over two
frames and swap the buffers each frame. The `MeshInputUniform`s for the
current frame contain the index of that mesh's `MeshInputUniform` for
the previous frame.

This commit produces wins in virtually every CPU part of the pipeline:
`extract_meshes`, `queue_material_meshes`,
`batch_and_prepare_render_phase`, and especially
`write_batched_instance_buffer` are all faster. Shrinking the amount of
CPU data that has to be shuffled around speeds up the entire rendering
process.

| Benchmark              | This branch | `main`  | Speedup |
|------------------------|-------------|---------|---------|
| `many_cubes -nfc`      |      17.259 |  24.529 |  42.12% |
| `many_cubes -nfc -vpi` |     302.116 | 312.123 |   3.31% |
| `many_foxes`           |       3.227 |   3.515 |   8.92% |

Because mesh uniform building requires compute shader, and WebGL 2 has
no compute shader, the existing CPU mesh uniform building code has been
left as-is. Many types now have both CPU mesh uniform building and GPU
mesh uniform building modes. Developers can opt into the old CPU mesh
uniform building by setting the `use_gpu_uniform_builder` option on
`PbrPlugin` to `false`.

Below are graphs of the CPU portions of `many-cubes
--no-frustum-culling`. Yellow is this branch, red is `main`.

`extract_meshes`:
![Screenshot 2024-04-02
124842](https://github.com/bevyengine/bevy/assets/157897/a6748ea4-dd05-47b6-9254-45d07d33cb10)
It's notable that we get a small win even though we're now writing to a
GPU buffer.

`queue_material_meshes`:
![Screenshot 2024-04-02
124911](https://github.com/bevyengine/bevy/assets/157897/ecb44d78-65dc-448d-ba85-2de91aa2ad94)
There's a bit of a regression here; not sure what's causing it. In any
case it's very outweighed by the other gains.

`batch_and_prepare_render_phase`:
![Screenshot 2024-04-02
125123](https://github.com/bevyengine/bevy/assets/157897/4e20fc86-f9dd-4e5c-8623-837e4258f435)
There's a huge win here, enough to make batching basically drop off the
profile.

`write_batched_instance_buffer`:
![Screenshot 2024-04-02
125237](https://github.com/bevyengine/bevy/assets/157897/401a5c32-9dc1-4991-996d-eb1cac6014b2)
There's a massive improvement here, as expected. Note that a lot of it
simply comes from the fact that `MeshInputUniform` is `Pod`. (This isn't
a maintainability problem in my view because `MeshInputUniform` is so
simple: just 16 tightly-packed words.)

## Changelog

### Added

* Per-mesh instance data is now generated on GPU with a compute shader
instead of CPU, resulting in rendering performance improvements on
platforms where compute shaders are supported.

## Migration guide

* Custom render phases now need multiple systems beyond just
`batch_and_prepare_render_phase`. Code that was previously creating
custom render phases should now add a `BinnedRenderPhasePlugin` or
`SortedRenderPhasePlugin` as appropriate instead of directly adding
`batch_and_prepare_render_phase`.
  • Loading branch information
pcwalton authored Apr 10, 2024
1 parent a9943e8 commit 11817f4
Show file tree
Hide file tree
Showing 17 changed files with 1,899 additions and 295 deletions.
24 changes: 14 additions & 10 deletions crates/bevy_pbr/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ pub mod graph {
/// Label for the screen space ambient occlusion render node.
ScreenSpaceAmbientOcclusion,
DeferredLightingPass,
/// Label for the compute shader instance data building pass.
GpuPreprocess,
}
}

Expand Down Expand Up @@ -133,13 +135,19 @@ pub struct PbrPlugin {
pub prepass_enabled: bool,
/// Controls if [`DeferredPbrLightingPlugin`] is added.
pub add_default_deferred_lighting_plugin: bool,
/// Controls if GPU [`MeshUniform`] building is enabled.
///
/// This requires compute shader support and so will be forcibly disabled if
/// the platform doesn't support those.
pub use_gpu_instance_buffer_builder: bool,
}

impl Default for PbrPlugin {
fn default() -> Self {
Self {
prepass_enabled: true,
add_default_deferred_lighting_plugin: true,
use_gpu_instance_buffer_builder: true,
}
}
}
Expand Down Expand Up @@ -280,7 +288,9 @@ impl Plugin for PbrPlugin {
.register_type::<DefaultOpaqueRendererMethod>()
.init_resource::<DefaultOpaqueRendererMethod>()
.add_plugins((
MeshRenderPlugin,
MeshRenderPlugin {
use_gpu_instance_buffer_builder: self.use_gpu_instance_buffer_builder,
},
MaterialPlugin::<StandardMaterial> {
prepass_enabled: self.prepass_enabled,
..Default::default()
Expand All @@ -292,6 +302,9 @@ impl Plugin for PbrPlugin {
ExtractComponentPlugin::<ShadowFilteringMethod>::default(),
LightmapPlugin,
LightProbePlugin,
GpuMeshPreprocessPlugin {
use_gpu_instance_buffer_builder: self.use_gpu_instance_buffer_builder,
},
))
.configure_sets(
PostUpdate,
Expand Down Expand Up @@ -386,15 +399,6 @@ impl Plugin for PbrPlugin {
let draw_3d_graph = graph.get_sub_graph_mut(Core3d).unwrap();
draw_3d_graph.add_node(NodePbr::ShadowPass, shadow_pass_node);
draw_3d_graph.add_node_edge(NodePbr::ShadowPass, Node3d::StartMainPass);

render_app.ignore_ambiguity(
bevy_render::Render,
bevy_core_pipeline::core_3d::prepare_core_3d_transmission_textures,
bevy_render::batching::batch_and_prepare_sorted_render_phase::<
bevy_core_pipeline::core_3d::Transmissive3d,
MeshPipeline,
>,
);
}

fn finish(&self, app: &mut App) {
Expand Down
13 changes: 6 additions & 7 deletions crates/bevy_pbr/src/lightmap/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ use bevy_render::{
};
use bevy_utils::HashSet;

use crate::RenderMeshInstances;
use crate::{ExtractMeshesSet, RenderMeshInstances};

/// The ID of the lightmap shader.
pub const LIGHTMAP_SHADER_HANDLE: Handle<Shader> =
Expand Down Expand Up @@ -132,10 +132,9 @@ impl Plugin for LightmapPlugin {
return;
};

render_app.init_resource::<RenderLightmaps>().add_systems(
ExtractSchedule,
extract_lightmaps.after(crate::extract_meshes),
);
render_app
.init_resource::<RenderLightmaps>()
.add_systems(ExtractSchedule, extract_lightmaps.after(ExtractMeshesSet));
}
}

Expand All @@ -159,8 +158,8 @@ fn extract_lightmaps(
if !view_visibility.get()
|| images.get(&lightmap.image).is_none()
|| !render_mesh_instances
.get(&entity)
.and_then(|mesh_instance| meshes.get(mesh_instance.mesh_asset_id))
.mesh_asset_id(entity)
.and_then(|mesh_asset_id| meshes.get(mesh_asset_id))
.is_some_and(|mesh| mesh.layout.0.contains(Mesh::ATTRIBUTE_UV_1.id))
{
continue;
Expand Down
14 changes: 7 additions & 7 deletions crates/bevy_pbr/src/material.rs
Original file line number Diff line number Diff line change
Expand Up @@ -508,6 +508,8 @@ pub const fn screen_space_specular_transmission_pipeline_key(
}
}

/// For each view, iterates over all the meshes visible from that view and adds
/// them to [`BinnedRenderPhase`]s or [`SortedRenderPhase`]s as appropriate.
#[allow(clippy::too_many_arguments)]
pub fn queue_material_meshes<M: Material>(
opaque_draw_functions: Res<DrawFunctions<Opaque3d>>,
Expand Down Expand Up @@ -647,7 +649,8 @@ pub fn queue_material_meshes<M: Material>(
let Some(material_asset_id) = render_material_instances.get(visible_entity) else {
continue;
};
let Some(mesh_instance) = render_mesh_instances.get(visible_entity) else {
let Some(mesh_instance) = render_mesh_instances.render_mesh_queue_data(*visible_entity)
else {
continue;
};
let Some(mesh) = render_meshes.get(mesh_instance.mesh_asset_id) else {
Expand Down Expand Up @@ -693,8 +696,7 @@ pub fn queue_material_meshes<M: Material>(
match material.properties.alpha_mode {
AlphaMode::Opaque => {
if material.properties.reads_view_transmission_texture {
let distance = rangefinder
.distance_translation(&mesh_instance.transforms.transform.translation)
let distance = rangefinder.distance_translation(&mesh_instance.translation)
+ material.properties.depth_bias;
transmissive_phase.add(Transmissive3d {
entity: *visible_entity,
Expand All @@ -717,8 +719,7 @@ pub fn queue_material_meshes<M: Material>(
}
AlphaMode::Mask(_) => {
if material.properties.reads_view_transmission_texture {
let distance = rangefinder
.distance_translation(&mesh_instance.transforms.transform.translation)
let distance = rangefinder.distance_translation(&mesh_instance.translation)
+ material.properties.depth_bias;
transmissive_phase.add(Transmissive3d {
entity: *visible_entity,
Expand Down Expand Up @@ -746,8 +747,7 @@ pub fn queue_material_meshes<M: Material>(
| AlphaMode::Premultiplied
| AlphaMode::Add
| AlphaMode::Multiply => {
let distance = rangefinder
.distance_translation(&mesh_instance.transforms.transform.translation)
let distance = rangefinder.distance_translation(&mesh_instance.translation)
+ material.properties.depth_bias;
transparent_phase.add(Transparent3d {
entity: *visible_entity,
Expand Down
23 changes: 8 additions & 15 deletions crates/bevy_pbr/src/prepass/mod.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
mod prepass_bindings;

use bevy_render::batching::{batch_and_prepare_binned_render_phase, sort_binned_render_phase};
use bevy_render::mesh::{GpuMesh, MeshVertexBufferLayoutRef};
use bevy_render::render_resource::binding_types::uniform_buffer;
pub use prepass_bindings::*;
Expand Down Expand Up @@ -145,7 +144,11 @@ where
update_mesh_previous_global_transforms,
update_previous_view_data,
),
);
)
.add_plugins((
BinnedRenderPhasePlugin::<Opaque3dPrepass, MeshPipeline>::default(),
BinnedRenderPhasePlugin::<AlphaMask3dPrepass, MeshPipeline>::default(),
));
}

let Some(render_app) = app.get_sub_app_mut(RenderApp) else {
Expand All @@ -157,18 +160,7 @@ where
.add_systems(ExtractSchedule, extract_camera_previous_view_data)
.add_systems(
Render,
(
(
sort_binned_render_phase::<Opaque3dPrepass>,
sort_binned_render_phase::<AlphaMask3dPrepass>
).in_set(RenderSet::PhaseSort),
(
prepare_previous_view_uniforms,
batch_and_prepare_binned_render_phase::<Opaque3dPrepass, MeshPipeline>,
batch_and_prepare_binned_render_phase::<AlphaMask3dPrepass,
MeshPipeline>,
).in_set(RenderSet::PrepareResources),
)
prepare_previous_view_uniforms.in_set(RenderSet::PrepareResources),
);
}

Expand Down Expand Up @@ -786,7 +778,8 @@ pub fn queue_prepass_material_meshes<M: Material>(
let Some(material_asset_id) = render_material_instances.get(visible_entity) else {
continue;
};
let Some(mesh_instance) = render_mesh_instances.get(visible_entity) else {
let Some(mesh_instance) = render_mesh_instances.render_mesh_queue_data(*visible_entity)
else {
continue;
};
let Some(material) = render_materials.get(*material_asset_id) else {
Expand Down
Loading

0 comments on commit 11817f4

Please sign in to comment.