[GS Docs] Clearing #4311
tadanokojin
started this conversation in
GS Docs
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Cache Optimization
There are two simple facts we must acknowledge when talking about optimizing the GS.
Fact 1: The GS caches memory by page. It implements this for both the input texture as well as the framebuffer. For the input texture, it simply caches a single page of the texture data. The framebuffer cache however must contain both depth and color information and so the GS splits this cache in half (up and down the middle in 32 bit formats). This means when depth buffering the cache is effectively half as big.
Fact 2: The GS draws grids of pixels. Specifically, it draws in an 8x2 or 4x2 grid depending on if texture mapping is enabled. Additionally, the GS draws this grid across the screen right to left top to bottom.
Imagine you have a framebuffer and it's 10 pages wide and so many pages high. If you drew a single sprite that covered the entire screen, you would greatly increase page breaks in the x direction. The GS would honor this request by first drawing the top two pixels of the framebuffer from left to right, incurring 9 page breaks along the way. It would reset left and then move down two pixels and begin drawing to the right, incurring an additional page break as well those same page breaks from before.
The GS therefor heavily penalizes drawing wide primitives which cross page boundaries.
The Clears
The official PS2 SDK implements a fast clear path. Using the information above about optimization, you can probably guess how this function is implemented. However, this is not the fastest way to clear a texture and because the GS is dealt with at such a low-level we can actually do better depending on our needs. This is not meant to be a comprehensive list. Some of these screen clearing variants have their own variants or can be combined to create new variants. I've made some footnotes to highlight some of these.
This is also the part where I start using names that were made up by myself and other people.
Fast Clear
A fast clear is the easiest to understand. It's also the version implemented by the official SDK as well as GSKit. It involves one of the following:
This method should always be favored over a full-screen sprite as it is much more cache friendly.
Double-Half Clear
I know, the name is weird but it will make sense after I explain it.
This is a further optimized version of the previous technique that only works on color or depth individually. It uses the same optimization of a 32 pixel wide sprites but this time only half the height is cleared. The other half is cleared by pointing depth half-way down the texture1.
Since depth and color are written in parallel, this effectively clears the texture in half the time. Since the clear is done with two halves simultaneously one might call it a "two halves clear" or "double half clear". I know you were thinking " but isn't double half just a whole". Yes, that's exactly the point.
VIS Clear
I coined this name because a variant of it is used in games made by VIS2. We can flip our cache problem on it's head and instead of making sure our sprites are only a page wide, we can make the texture a page wide.
The GS gives us low-level access to how memory is interpreted and reinterpreting a texture that was 10 pages wide to a texture that is a single page wide simply has the effect of stacking all our pages vertically. Therefor, we can simply draw a sprite tall enough to account for this.
While this technique is faster than a fast clear or drawing a full-screen sprite, it's not theoretically faster than a double-half clear. Additionally, many emulators (including PCSX2) have difficulty emulating this and therefor it's usage should be avoided if you expect your code to run on these emulators.
Interwoven Clear
This is a variant of the double-half clear that I first discovered debugging a game called Powerdrome3 and it makes an interesting observation about the cache. Remember that I mentioned briefly that the framebuffer page cache is shared between color and depth. In order to make this work without hazards, Sony allocated one half of the cache for depth and one half for color. Since color and depth are swizzled slightly differently to account for this no block of data is written to at the same time inside the cache. While color is writing to block 0 at the left side of the cache, depth is writing to block 0 on the right side.
We can use this design quirk this to our advantage. By pointing the color and depth pointers to the same base address blocks 0-15 are loaded on the left side of the cache and blocks 16-31 are loaded on the right side. We can then just clear the left side with depth enabled and we get the right side for free.
Footnotes
Forbidden Siren uses a special variant of this. Instead of clearing both with the same bit depth (CT32 and Z32 for example), they use two compatible formats (CT24 and Z32). ↩
VIS games use a special variant of this. Instead of clearing the screen in the same format as they intend to use the texture (CT32) they clear it as CT16 while accounting for the difference in color format. ↩
Powerdrome uses a special variant of this. The developers made a mistake and used Z24 for both color and depth writes. The GS disallows this and has a special hw quirk that will force the depth buffer format to CT24. Using a non-z format for depth is not possible under normal circumstances. ↩
Beta Was this translation helpful? Give feedback.
All reactions