Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid unicode strings in various wld files #2

Open
bryab opened this issue Feb 6, 2022 · 3 comments
Open

Invalid unicode strings in various wld files #2

bryab opened this issue Feb 6, 2022 · 3 comments

Comments

@bryab
Copy link
Collaborator

bryab commented Feb 6, 2022

I am trying to figure out how to deal with the fact that many strings in these files are either invalid utf-8 (and encoded in some other format?) or are simply not strings at all.

Take feerrott for example. Its main string hash has invalid utf8; it will fail to load altogether.
For feerrott_obj, the animated fire texture's TextureFragmentEntry's "file_name" parameter is an invalid string.

I'd love to try to figure this out - what tools did you use for examining these files?

@bryab
Copy link
Collaborator Author

bryab commented Feb 7, 2022

OK, I got this working using the encoding_rs create and decoding use WINDOWS_1252. Will post up this up at some point.

@bryab bryab closed this as completed Feb 7, 2022
@cjab
Copy link
Owner

cjab commented Feb 8, 2022

Nice! I think in my branch currently I'm just swallowing those utf-8 errors. I had no idea the encoding was windows 1252, good find! :D

As for tools I've been using kind of hodge podge of things. hexyl for viewing fragment data, and more recently I've used kaitai (http://kaitai.io) to build some quick and dirty parsers to test things out. This has been a nice little calculator also :P https://github.com/alt-romes/programmer-calculator

I do have a little TUI I've written for stepping through fragments and viewing their fields plus a hex view of their contents (via hexyl). I'll see if I can get that up here as well. Last weekend didn't work out the way I had hoped 😆.

@bryab bryab reopened this Mar 31, 2022
@bryab
Copy link
Collaborator Author

bryab commented Mar 31, 2022

I'm going to reopen this because honestly I don't think the windows 1252 encoding is correct. Something else is going on here. For example in the feerrott zone, using windows_1252, one of the materials says its texture name is:

•:å*•z•j•:;õj…wšpzuaœ;å¨â{å*•z•j•:å*•z<ß×xóp±¸à4žøhï

I'm wondering if there's some "magic", possibly specifically in texture names that doesn't represent an actual string but instead some kind of other instruction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants