Skip to content

Commit

Permalink
Prefer modes 6&4
Browse files Browse the repository at this point in the history
  • Loading branch information
Андрей Евстюхин committed Jul 2, 2020
1 parent 1e9d218 commit 462db37
Show file tree
Hide file tree
Showing 21 changed files with 736 additions and 536 deletions.
44 changes: 24 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,20 @@ Nebc7 converts specified RGBA image to BC7 format. It focuses on image quality a

Nebc7 always preserves opaque alpha for opaque blocks.

## Details

Nebc7 computes bounding box for each partition to choose the most suitable mode. Trivial min & max bounds swap to conform channel's covariations. Such mode is very fast and can be enabled by "/draft" command-line switch. Usually only -3dB below of an ideal encoding.

Hi-quality BC7 encodings can waste space for noise contained in source images. Error calculation function shifts away least bit of |delta| to allow simple denoising (of rounded pixels). While indices selection function treats full 8-bit values preserving gradients. Unseen random delta with magnitude 1 gives 48.13dB cap. Shifting trick forces unusual naming for metrics: qMSE and qPSNR.

It is suitable to use low entropy modes for opaque blocks. Mode 6 has excellent block structure, while modes 4&5 can discard single index. Nebc7 tries modes in special order and uses error stepping for only valuable switching. Random deltas with magnitude 3 are usually invisible in non-gradient areas and give 38.58dB cap. An interesting fact: many monitors can show only 6-bit true color. It is possible to render modes map specifying "/map ..." command-line parameter. Mode 6 is painted gray, 4&5 - yellow, other modes use green, red, blue for partitions 0, 1, 2 accordingly.

Nebc7 sorts nearly all possible endpoint values of each single channel and chooses some good of them. Then exhaustive search tries all selected combinations and reveals the best solution. Sorting provides a surprisingly fast convergence for such slow process.

Modes 7, 1, 3 are memory-bound because of large tables, they partially limited in default working mode. Slow modes can be fully activated by impractical "/slow" command-line switch.

For premultiplied alpha it is necessary to specify "/nomask" command-line option. While extruded RGBA images can highly benefit from masking. Switch "/retina" allows future artifact-free scaling by 0.5. Masking gives smaller compressed images and better borders, because masked pixels can have any value.

## Usage

The solution was tested on SSSE3, SSE4.1, AVX, AVX2, AVX512BW - capable CPUs for Win64 API only.
Expand All @@ -18,12 +32,12 @@ I would recommend using AVX2 for the best performance. See Bc7Mode.h about setti

Recompressing "BC7Ltest.png" (gained from https://code.google.com/archive/p/nvidia-texture-tools/downloads bc7_export.zip) on i7-6700 CPU:

Bc7Compress.exe /slow /nomask /noflip BC7Ltest.png output.ktx /debug output.png
Bc7Compress.exe /nomask /noflip BC7Ltest.png output.ktx /debug output.png
Loaded BC7Ltest.png
Image 152x152, Texture 152x152
Compressed 1444 blocks, elapsed 26 ms, throughput 0.888 Mpx/s
SubTexture A MSE = 0.0, PSNR = 73.986163, SSIM_4x4 = 0.99999923
SubTexture RGB wMSE = 0.0, wPSNR = 62.172358, wSSIM_4x4 = 0.99999238
Compressed 1444 blocks, elapsed 9 ms, throughput 2.567 Mpx/s
SubTexture A qMSE = 0.0, qPSNR = 69.026097, SSIM_4x4 = 0.99997870
SubTexture RGB qMSE = 0.6, qPSNR = 50.685068, wSSIM_4x4 = 0.99964057
Saved output.ktx
Saved output.png

Expand All @@ -32,29 +46,19 @@ Compressing "frymire.png" (gained from https://github.com/castano/nvidia-texture
Bc7Compress.exe /nomask /noflip frymire.png frymire.ktx
Loaded frymire.png
Image 1118x1105, Texture 1120x1108
Compressed 77560 blocks, elapsed 449 ms, throughput 2.763 Mpx/s
Exactly A
SubTexture RGB wMSE = 0.2, wPSNR = 55.181449, wSSIM_4x4 = 0.99980677
Saved frymire.ktx

Compressing "frymire.png" in development mode:

Bc7Compress.exe /draft /nomask /noflip frymire.png frymire.ktx
Loaded frymire.png
Image 1118x1105, Texture 1120x1108
Compressed 77560 blocks, elapsed 141 ms, throughput 8.801 Mpx/s
Exactly A
SubTexture RGB wMSE = 0.4, wPSNR = 52.056761, wSSIM_4x4 = 0.99952034
Compressed 77560 blocks, elapsed 113 ms, throughput 10.981 Mpx/s
Whole A
SubTexture RGB qMSE = 0.5, qPSNR = 50.950326, wSSIM_4x4 = 0.97143024
Saved frymire.ktx

Compressing "8192.png" (gained from https://bitbucket.org/wolfpld/etcpak/downloads/8192.png) in development mode:

Bc7Compress.exe /draft /nomask /noflip 8192.png 8192.ktx
Loaded 8192.png
Image 8192x8192, Texture 8192x8192
Compressed 4194304 blocks, elapsed 12377 ms, throughput 5.422 Mpx/s
Exactly A
SubTexture RGB wMSE = 0.4, wPSNR = 52.364416, wSSIM_4x4 = 0.99625929
Compressed 4194304 blocks, elapsed 1913 ms, throughput 35.080 Mpx/s
Whole A
SubTexture RGB qMSE = 0.1, qPSNR = 56.778563, wSSIM_4x4 = 0.99534311
Saved 8192.ktx

## Copyright
Expand Down
27 changes: 7 additions & 20 deletions src/Bc7Compress.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,6 @@ static INLINED void VisualizePartitionsGRB(uint8_t* dst_bc7, int size)
int Bc7MainWithArgs(const IBc7Core& bc7Core, const std::vector<std::string>& args)
{
bool doDraft = true;
bool doFast = true;
bool doNormal = true;
bool doSlow = false;

Expand All @@ -257,39 +256,27 @@ int Bc7MainWithArgs(const IBc7Core& bc7Core, const std::vector<std::string>& arg
if (strcmp(arg, "/compare") == 0)
{
doDraft = false;
doFast = false;
doNormal = false;
doSlow = false;
continue;
}
else if (strcmp(arg, "/draft") == 0)
{
doDraft = true;
doFast = false;
doNormal = false;
doSlow = false;
continue;
}
else if (strcmp(arg, "/fast") == 0)
{
doDraft = true;
doFast = true;
doNormal = false;
doSlow = false;
continue;
}
else if (strcmp(arg, "/normal") == 0)
{
doDraft = true;
doFast = true;
doNormal = true;
doSlow = false;
continue;
}
else if (strcmp(arg, "/slow") == 0)
{
doDraft = true;
doFast = true;
doNormal = true;
doSlow = true;
continue;
Expand Down Expand Up @@ -439,7 +426,7 @@ int Bc7MainWithArgs(const IBc7Core& bc7Core, const std::vector<std::string>& arg
head[22] = flip ? 0x00753Du : 0x00643Du;
head[23] = static_cast<uint32_t>(Size); // imageSize

bc7Core.pInitTables(doDraft, doFast, doNormal, doSlow);
bc7Core.pInitTables(doDraft, doNormal, doSlow);

memcpy(dst_texture_bgra, src_texture_bgra, src_texture_h * src_texture_stride);

Expand Down Expand Up @@ -470,33 +457,33 @@ int Bc7MainWithArgs(const IBc7Core& bc7Core, const std::vector<std::string>& arg

if (mse_alpha > 0)
{
PRINTF(" SubTexture A MSE = %.1f, PSNR = %f, SSIM_4x4 = %.8f",
PRINTF(" SubTexture A qMSE = %.1f, qPSNR = %f, SSIM_4x4 = %.8f",
(1.0 / kAlpha) * mse_alpha / pixels,
10.0 * log((255.0 * 255.0) * kAlpha * pixels / mse_alpha) / log(10.0),
ssim.Alpha * 16.0 / pixels);
}
else
{
PRINTF(" Exactly A");
PRINTF(" Whole A");
}

if (mse_color > 0)
{
#if defined(OPTION_LINEAR)
PRINTF(" SubTexture RGB MSE = %.1f, PSNR = %f, SSIM_4x4 = %.8f",
PRINTF(" SubTexture RGB qMSE = %.1f, qPSNR = %f, SSIM_4x4 = %.8f",
(1.0 / kColor) * mse_color / pixels,
10.0 * log((255.0 * 255.0) * kColor * pixels / mse_color) / log(10.0),
ssim.Color * 16.0 / pixels);
#else
PRINTF(" SubTexture RGB wMSE = %.1f, wPSNR = %f, wSSIM_4x4 = %.8f",
PRINTF(" SubTexture RGB qMSE = %.1f, qPSNR = %f, wSSIM_4x4 = %.8f",
(1.0 / kColor) * mse_color / pixels,
10.0 * log((255.0 * 255.0) * kColor * pixels / mse_color) / log(10.0),
ssim.Color * 16.0 / pixels);
#endif
}
else
{
PRINTF(" Exactly RGB");
PRINTF(" Whole RGB");
}

SaveBc7(dst_name, (const uint8_t*)head, sizeof(head), dst_bc7, Size);
Expand Down Expand Up @@ -549,7 +536,7 @@ int __cdecl main(int argc, char* argv[])

if (argc < 2)
{
PRINTF("Usage: Bc7Compress [/fast | /normal | /slow | /draft] [/retina] [/nomask] [/noflip] src");
PRINTF("Usage: Bc7Compress [/draft | /normal | /slow] [/retina] [/nomask] [/noflip] src");
PRINTF(" [dst.ktx] [/debug result.png] [/map partitions.png] [/bad bad.png]");
return 1;
}
Expand Down
Loading

0 comments on commit 462db37

Please sign in to comment.