Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rar file pattern as requested in #258 #324

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

targodan
Copy link
Contributor

Pattern

The RAR file format as defined in the official documentation (https://www.rarlab.com/technote.htm). This pattern was requested in #258 .

There are still some issues I'd like to resolve, but I'm not versed enough in the pattern language to do so. I need some help here.

Issue 1: I have multiple cases where they have used a variable-length integer with certain bits indicating certain things. In some cases I got away with just using a fixed size bitfield, because the fields effectively were fixed width (at least with the linux rar implementation). In case of line 295 I did observe the field having variable width. I know of no way to cast the computed value of a vint into a bitfield. Maybe I'm missing a feature of the pattern language or it simply isn't possible.

Issue 2: As stated in line 360 I'd prefer to read headers until I reach a specific header, rather than end of file. This would allow for successful extraction of embedded rar files. Again, I don't know how to do this. Sadly, there is no absolute archive size anywhere that I could use instead.

Checklist

  • A pattern for this format doesn't exist yet (or this PR improves the existing one)
  • The new pattern has been added to the relevant table in the Readme
  • The new pattern has a description pragma (#pragma description My pattern Description here)
  • The pattern was associated with all relevant MIME types (using #pragma MIME mime-type in the source code)
    • Make sure to never use application/octet-stream here as that means "Unidentifiable binary data"
  • A test file for this pattern has been added to /tests/patterns/test_data
    • Try to keep this file below ~ 1 MB

PS

FYI: I'm not sure why, but Github didn't load the PR template, I copy-pasted it manually.

Removed some experimental (and apperantly broken) code.
@targodan
Copy link
Contributor Author

Not sure why unit tests of other patterns suddenly fail, my rar.hexpat passed the unit tests. I did not modify any existing files except the README.md

@WerWolv
Copy link
Owner

WerWolv commented Nov 27, 2024

Thanks a lot! The CI errors were my fault, sorry. I restarted the CI again now and it should hopefully pass

@WerWolv
Copy link
Owner

WerWolv commented Nov 28, 2024

About your questions,

Issue 1:
It's possible to write bitfield fields whose length depend on other fields or other values in the pattern. For example:

bitfield Test {
    field1 : 3;
    field2 : field1;
};

Issue 2:
Using while(std::mem::eof()) is fine in my opinion. If you want to stop the evaluation of the array sooner though, you can use a break statement inside one of the structs that are placed by that array and it will terminate the array immediately. That way you should be able to stop once you hit that header

@targodan
Copy link
Contributor Author

targodan commented Dec 2, 2024

Thanks for the suggestions :)

Issue 1:

I don't think this approach will work. The vint implementation is a little more complex. The MSBit is basically a continue indicator. As long as that bit is set, we need to read another byte. Then for the actual integer value we have to mask out the continuation bits and do some shifting. Only then, does this computed value act as a bitfield.

You can see the decoding implementation in line 33.

So the problem is that I have a computed value (currently managed via both a transform and a format attribute) and the resulting value needs to be interpreted as a bit field. Is this possible?

Issue 2:

That's useful, I think. :) Implemented it.

@WerWolv
Copy link
Owner

WerWolv commented Dec 2, 2024

There's not a very nice way to do it currently but here is one way. This uses the uLEB128 type since it works similar to your vint type.

The std::mem::Reinterpreter template is basically a templated union where you can assign a value to the from_value field and then read back the bits as a different type from the to_value field. Here I'm reinterpreting the decoded u64 value as a MyBitfield type and then print it. This works but it's not exactly pretty. We should probably add some nicer syntax for this

import type.leb128;

type::uLEB128 value @ 0x00;

bitfield MyBitfield {
    a : 1;
    b : 2;
    c : 3;
    padding : 58;
};

std::mem::Reinterpreter<u64, MyBitfield> reinterpreter;
reinterpreter.from_value = value;
std::print("{}", reinterpreter.to_value);

@paxcut
Copy link
Contributor

paxcut commented Dec 2, 2024

The vint defined in the rar documentation is really uLEB128. For some reason they don't call it that but the definition is identical. Also the rar template in 010 uses uLEB128 instead of vint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants