Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bZip2: Partial matches/no matches #95

Open
M-Gonzalo opened this issue Jan 22, 2019 · 3 comments
Open

bZip2: Partial matches/no matches #95

M-Gonzalo opened this issue Jan 22, 2019 · 3 comments
Assignees
Milestone

Comments

@M-Gonzalo
Copy link

The file in https://web.archive.org/web/20150319192112/http://freearc.org/download/testing/FreeArc-0.67-alpha-sources.tar.bz2 decompresses to 5855141 bytes but precomp -cn yields a file of 2388084 bytes.

100.00% - New size: 2388084 instead of 1390169     

Done.
Time: 2 second(s), 691 millisecond(s)

Recompressed streams: 1/1
bZip2 streams: 1/1

@schnaader schnaader self-assigned this Feb 21, 2019
@schnaader
Copy link
Owner

schnaader commented Feb 21, 2019

This is similar to the old zlib behaviour (e.g. #21 ), recompression isn't identical. A more advanced bZip2 recompression algorithm (similar to what preflate does with zlib) would be needed here to completely solve this (which won't happen soon, I guess).

Anyway, there's another remaining issue I'd like to point out: The partial match found here is hurting the compression ratio. When using -v:

Compressed size: 1390169
Can be decompressed to 6285312 bytes
Identical recompressed bytes: 52 of 1390169
Identical decompressed bytes: 997888 of 6285312
Best match: 52 bytes, decompressed to 997888 bytes

Using -cl, this leads to 1,629,242 bytes (instead of 1,390,354 bytes using -t+), so it would be useful to use the partial match mechanism introduced in cfa602c for bZip2 streams, too.

schnaader added a commit that referenced this issue Feb 22, 2019
- Use the ratio that was used before for partial zLib matches
- Related: #95
@schnaader schnaader added this to the Nice to have milestone Feb 22, 2019
@schnaader schnaader changed the title bZip2 recompression problem. bZip2: Partial matches/no matches Feb 22, 2019
@schnaader
Copy link
Owner

Discarding insufficient partial matches like described above now. New output of -v:

(0.00%) Possible bZip2-Stream found at position 0, compression level = 9
Compressed size: 1390169
Can be decompressed to 6285312 bytes
Identical recompressed bytes: 52 of 1390169
Identical decompressed bytes: 997888 of 6285312
Not enough identical recompressed bytes
No matches
New size: 1390354 instead of 1390169

@schnaader
Copy link
Owner

The issue will stay open as a known issue, I changed the title to make it more clear what the issue is about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants