-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vertexcodec: Implement support for compression levels #824
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
While individual components of the encoder are going to be optimized further, to ensure that control over encode time/size tradeoff is still present, meshopt_encodeVertexBufferLevel can now choose compression level. For v0 all levels are equivalent; for v1, currently: - level 0 is closest to v0 and picks a single bitgroup layout (while still supporting zero channels as these are fast to reject) - level 1 picks the best bitgroup layout but only uses byte deltas - level 2 selects 1- or 2- byte deltas per channel, without XOR/rot - level 3 selects XOR/rot per channel These may change as performance characteristics of the encoder change.
This will allow for easier performance tuning and comparison.
codectest -test -1 will select compression level 1; for now this only works for file based testing and not for pipe mode which always uses level 2 (default).
Instead of bruteforcing all rotations and computing the xor-encoded size, we can estimate the size without using encoding, and do this analysis on groups of 16 deltas. For each group, we OR the deltas together, which results in "1" bits whenever there's any inconsistency within the group. This is an estimate that is not always as good as the correct value, but it's generally pretty close: if there is a significant bias towards selecting a specific rotation, this algorithm will still find it, and do so 10x faster.
This centralizes the estimation overhead and makes it easier to profile and understand the code. Also, instead of guarding against vertex count in estimate*, move that to the call site.
We do not need to analyze groups that have all bits in equally consistent state; a branch here ends up being slightly beneficial on average.
Since we can't address this table using implicit address math in either layout based on hbits, we might as well use a shorter and more readable layout where shifts and sentinels go together.
Instead of calling a generic encodeBytesMeasure twice, we can note that it redundantly recomputes almost all of the information: in v1, bits 1/2/4 are available to both control modes, so we just need to compute 0 and 8 sizes - of which, 8 is a constant size. So we can compute the sizes for both streams in parallel. Doing this preserves the bitstream exactly, but results in ~20% faster encoding at level 1.
The sentinel branch is difficult to predict; since we have enough space to encode all bytes as sentinels due to decode limit padding, it is safe to append every byte unconditionally and move the pointer for out of range values. This accelerates all levels of encoding further, up to 30% for levels 0 and 1. Also rework encodeBytes flow to call the function just once; this mostly just makes sure the compiler can inline it without issues, as otherwise the function is too large to be inlined into two separate paths.
Deltas requires level 2; while this is currently the default, it's better to be explicit to avoid losing coverage. BitXor requires level 3; the previously specified level was incorrect so the code was not exercised properly. Also adjust BitXor to select xor for two last channels instead of just one.
This is the same wrt the encoding in practice, but this is somewhat cleaner for potential future expansions of channel encoding, as it leaves the full 4 lower bits to store additional modes.
We test all 4 levels for the new version to check no level has encoding issues.
estimateBits takes unsigned char so we need to explicitly truncate to silence.
After previous changes, encodeBytesMeasure is no longer used by any other function than estimateChannel. Inlining the function into estimateChannel allows us to simplify the code, and improves optimizations as an explicit measure is faster vs table selection in practice. This also allows us to drop one of the bit group modes to gain extra performance in the future. In addition we also fix last_vertex handling (this was incorrectly using the first vertex for all blocks instead of last vertex of the previous block) and reduce memset overhead by limiting it to the last (partial) block.
Instead of analyzing every block we could look at a subset of blocks and assume that the statistics of data in different blocks is reasonably close. This is a little brute-force, but gets almost the same compression results on a variety of files, so for now we can do this unconditionally at every level, which significantly increases the encoding throughput of levels 2 and 3.
Makes sure level is not negative and the usage doesn't contain mistakes such as passing vertex_size as a level instead. We allow 0..9 range to allow for possible future expansion of the current 0..3 range.
Instead of having three versions of zigzag per type we can use a template similarly to unzigzag. This produces ~same code with less duplication.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While individual components of the encoder can always be optimized
further, to ensure that control over encode time/size tradeoff is still
present, meshopt_encodeVertexBufferLevel can now choose compression
level. For v0 all levels are equivalent; for v1, currently:
still supporting zero channels as these are fast to reject)
These may change as performance characteristics of the encoder change.
This PR also significantly optimizes all levels in subsequent commits.
The initial numbers after the first commit were:
The new numbers are:
For now level 2 is the default; while it is currently ~20% slower to encode vs what v0 used to encode at, using level 1 as a baseline would disable wider deltas and realistically almost 500 MB/s of encoding speed is likely sufficient. Applications that need slightly more gains for complex bitpacked data could choose level 2; applications that need streaming encoding can choose levels 1 or 0.
This contribution is sponsored by Valve.