-
Notifications
You must be signed in to change notification settings - Fork 773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the mini-stream size in the root property #731
Conversation
Thank you! |
I had to revert this. There were memory issues in our stress tests and this appears to be the most likely cause. https://ci-builds.apache.org/job/POI/job/POI-DSL-1.8/ There were other failures in other builds too - anyone looking to browse around are welcome to look at build jobs in https://ci-builds.apache.org/job/POI/ |
That's weird, because the modified calculation simply loops through the mini FAT sectors and checks the allocation of each mini sector, that's hardly different from the previous method. I'll investigate. |
Unfortunately, the poi-integration module and its stress tests are a real pain. They take all the files in test-data and run basic scenarios. Some files are explicitly excluded. Unfortunately, many files are checked in specifically because they are problematic. The evidence is of failures across many CI runs over the last number of hours - and the CI is much more stable since I reverted the change. There are some edge case CI runs (eg OpenJDK) that are already broken - but I'm talking about CI plans that pass normally. |
I haven't been able to run the integration tests, Gradle keeps throwing error messages, but looking at the CI log I see two failures:
and:
The first one is related to an OOXML file (spreadsheet/StructuredRefs-lots-with-lookups.xlsx). It's not clear which file was being processed on the second failure, but the error occurs in the middle of OOXML files. If I'm not mistaken OOXML files are completely unrelated to the old OLE/COM/CFB files, so I fail to see why the change to the calculation of the size of the root property triggered these failures. Maybe the memory usage changed slightly and the garbage collector kicked in at a different time, which in a tight memory context may lead to an OOM. Is it possible to increase the memory allocated to the integration tests? |
POIFS is used all over the poi-ooxml code base. |
I've pushed a couple of changes to the PR branch:
I don't know if that's enough for the integration tests to pass, but it can't hurt. If the integration tests still fail then the remaining difference with the previous code is either the double loop over the mini FAT sectors, one forward to write them, and one backward to compute the size, or the different size computed which is always larger or equal than previously and may lead to different memory allocations down the road. |
I added back the original commit and increased the memory in gradle build. The first run went ok and the GitHUB CI was already passing with and without this change. https://ci-builds.apache.org/job/POI/job/POI-DSL-1.8/1171/ @ebourg Can you take any new improvements in this branch and create a new PR? |
One extreme case comes to mind, for example a file with n mini FAT sectors, all unallocated except the last one with only one allocated mini sector at the beginning. In this case the old method computes a size of 1 sector (64 bytes) and the new method computes a size of 128 x (n - 1) + 1 sectors. For a file using 4096 byte sectors and 10 mini FAT sectors, the computed size of the mini FAT would grow from 64 bytes to 576KB. |
Thank you
Sure, see #735 Might be nice to migrate POI to Git btw ;) Integrating GitHub PRs would be easier. |
This is a follow up to #182, which didn't cover all possibles cases of gaps in the allocation table of the mini stream.
This PR contains a unit test generating a file with the following layout:
With this layout POI 5.3 computes a mini stream size of 29696 bytes, that is the sum of the "occupied" bytes (as per
BATBlock.getOccupiedSize()
) for each sector : (0 + 128 + 128 + 0 + (128-32) + (128-16) + 0 + 0) x 64.After signing the file with signtool:
So the right method to compute the size is to find the absolute index of the last allocated mini sector, add one and multiply by 64.