Skip to content

Controller Pak repair operations

bryc edited this page Jan 23, 2024 · 10 revisions

Background

The focus of MPKEdit's parser had always been strict detection of valid data. Because the user can attempt to load any file, and MPKEdit can support multiple filetypes, I wanted to ensure that if a file loaded, you knew it had passed rigorous validity testing and was not corrupt. I felt this was important, because the actual C-Pak filesystem data structure is unfortunately not strongly defined.

However, Controller Paks are sometimes subject to corruption. Libultra, the official library that contains the code which handles Controller Pak operations, does not need to do any strict file detection; it can always assume that the input data will only ever be a Controller Pak, and so it can loosen up on the strictness of its checks.

Therefore, it can focus straight away on detecting corrupt data, repairing it as much as possible, and even reformatting the filesystem to an empty state as a final resort.

I've known that Libultra was able to fix file indexing errors that MPKEdit could not. That is to say, MPKEdit would reject a file that Libultra would be able to repair and load. And I wanted to explore the possibility of matching this support in MPKEdit.

The main concern is that I don't want to undermine the strictness of the validator; because it must serve as a file-format detector as well. So I'll have to maintain a balance.

Phase 1

The first update I got working is able to erase any key indexes which aren't associated with a note. Basically, every file on a Controller Pak specifies a starting index, also known as an entry point. By taking advantage of certain logic, we can find all the starting indexes without even looking at the Note Table. And we can also recognize that the starting indexes in the Note Table should correspond exactly to the key indexes. If any key index was found that is not in the Note Table, we can assume that it's invalid, because it represents unreachable data. In theory it may represent recoverable data, but it would require outside intervention and is not the responsibility of a general library.

In any case, what this update does is erase any Notes whose indexes have no end marker, and also it will erase indexes associated with a key index that doesn't exist in the NoteTable. A single bitflip could allow this to occur, causing a break in the chain of a particular note's data sequence.

To the end user, this will appear as if one of their save files were deleted, but the remaining data is kept in tact. This is the only acceptable action.

Caveats

There are some caveats to be aware of.

  • If one of a note's indexes turns into 0x01, it will be interpreted as a valid end marker. Potentially this means that the file will be truncated, and will essentially be corrupt. An invalid key index would also be created, and this will end up being erased. In theory this could be detected by counting the number of End markers, but I'm unsure if a truly reliable course of action could be applied.
    • I verified that Libultra will truncate the affected note; it can't detect that the note was resized. No change necessary to match.
  • If a corruption occurs that causes one index chain to connect to another index chain, the possibility of needing to erase both may arise.
    • Libultra will erase BOTH notes if they contain a duplicate index. MPKEdit currently refuses to load this file due to the fact "duplicate indexes" cause a critical error.
    • Even if I allow execution to continue, the duplicate index usage requires separate additional handling.

Phase 2

  • Libultra indeed requires the left digit to be zero, but it only erases the note. MPKEdit requires ALL digits to be zero, which is true for any C-Pak using the default bank size. In theory, Libultra could avoid actually recover this data by simply forcing it to be zero again with the assumption that it is out of range on default bank size. But I'm not sure if MPKEdit itself could afford to be this lenient. This may be something exclusive to the Libdragon port.

Re-evaluate file detection?

I suppose it might be worth reviewing the file detection scheme, to determine what is safe and what isn't.

  • Any file is accepted. The maximum file size is 290 KiB (296,960 bytes). The reason for this maximum is to support Mupen64plus files.
  • While 32,768 is the typical size of a Controller Pak, it cannot be relied on extensively due to my block footer used for storing timestamps and comments, which causes a variable file size. There may also be truncated Controller Pak files, but this would be rare.
  • There is currently no restriction on accepted file extensions. If I start to use one, here is the list of known extensions:
.note = MPKEdit note file.
.rawnote = Raw note data (identifiable info in filename).
.mpk = Standard single Controller Pak dump (32,768 bytes / 32 KiB)
.pak = Used by the Ares emulator.
.srm = Used by RetroArch save files.
.n64 = DexDrive single Controller Pak dump (36,928 bytes / 36 KiB)
.bin = Generic binary file. Used by 64drive for virtual paks (virtpak1.bin, virtpak4.bin)

.cpk = Rare, but potentially used before.
.cpak = Rare, but potentially used before.
.mempak = Rare, but potentially used before.
.sav = Unknown, but a generic extension used for emulator save files.

.bak = Backup files, often created by hex editors like HxD.

checkHeader

checkHeader is the first line of defense when checking if a Controller Pak is valid. It contains four checksum-protected blocks.

Step 1

The first thing it does is attempt to verify that the four checksums are correct. At least one checksum needs to be correct to pass this test. There's a built-in fix that can detect bugged DexDrive data that is known to have an incorrect checksum.

Step 2

If all four checksums are incorrect, we enter the "ID Repair" stage. Normally this would be the end and we would reject the file as corrupt, but Libultra is entirely capable of recovering data even if the checksums are all incorrect.

The first thing we do is a quick sanity check. There are 8 unused bytes in the IndexTable that are normally zero. This test can be done twice; one for primary, and another for backup.

The idea is that it's cheaper to stop here if the test fails than proceed. Whether this is smart or misguided is for you to decide.

There are two known cases where this data is not zero, and there's added support to detect these additional cases.

Step 3

So if the checksum fails and the sanity check passes, we reach "Fast IndexTable check" stage.

The idea here is that there are some fundamental logical truths to the structure of a valid IndexTable.

  1. The IndexTable has a backup copy, and under ideal conditions it is identical to the primary copy. While indeed a backup can allow a corrupt primary to be restored, considering this check only occurs in the rare circumstance that all 4 header checksums failed, I think verifying equality at this stage is a reasonable and effective sanity check.
    • In theory, we could prioritize an OR validation of primary/backup. If only one passes, the other is corrupt. In this case, an equality check could be considered overkill.
  2. Under the assumption of the default bank size, we know certain bytes can only be zero, and other bytes can only be within 5-127 or 3.
  3. It should never be possible to find duplicate indexes being used. This implies two files share data, which is simply impossible. This particular rule is probably the most "foolproof", however it requires keeping a lookup table, which is still relatively fast.

This particular loop will return false if it encounters bad data.

However, it doesn't stop yet. It then checks the 8-bit checksum of the primary table.

If the IndexTable checksum fails, it will do make a final effort:

  1. Check if at least 7 bits of the primary checksum are correct.
  2. Check if the backup checksum is identical to the primary checksum. (Which is actually a bug.. hmmm). I suppose my intention was to check if the backup checksum was valid, but I think this is NOT correct?

And that's it.

Step 4

After all 4 checksums have failed, and we performed a quick check of the filesystem, and it passes fully, we have reached this stage.

Simply put, it will proceed loading the file as if nothing happened.

On a real Controller Pak, I imagine it would be important to write a correct checksum at this time, as well as the required bits.

readNotes

When checkHeader passes, it becomes time to read the NoteTable data.

The reason we want to do this first, is because it holds the startIndex for each note. This is information we can use to corroborate the integrity of the IndexTable.