Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAK-10803 - Fix memory consumption of uncompress properties #1619

Merged
merged 31 commits into from
Aug 13, 2024

Conversation

ionutzpi
Copy link
Contributor

@ionutzpi ionutzpi commented Aug 1, 2024

In OAK-10803 we introduce the possibility of compressing property values. During its development we encountered performance issues, namely with repeated decompression calls, even just during getNodeAtRevision. Before we can confidentially enable and use compression, we need to revisit performance first. Likely we'll have to make further tweaks before we can use it.

Copy link
Contributor

@reschke reschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This saves space, but instead will make every read access slower, right?

@ionutzpi
Copy link
Contributor Author

ionutzpi commented Aug 1, 2024

Done. Refactored it to make read access faster.

@stefan-egli
Copy link
Contributor

I don't see the value in this. This will restrict usability of compression to the time between creation and first access of the value. Thereafter it uses more space.

@ionutzpi
Copy link
Contributor Author

ionutzpi commented Aug 1, 2024

Revert it to first commit. Yes, the read access is slower.

@stefan-egli
Copy link
Contributor

decompressedValue is never set?

@ionutzpi
Copy link
Contributor Author

ionutzpi commented Aug 1, 2024

Done.

pirlogea added 2 commits August 5, 2024 10:18
@stefan-egli
Copy link
Contributor

IIUC this PR has now addressed the suggestion on OAK-10803. That I think we should follow-up with indeed.

What about doing it in 2 steps though:

  • first address those suggestions (bring memory consumption back to pre-compression)
  • then look into performance improvements

Currently this PR seems to mix both concerns, which makes review discussion a bit more complex.

Having said that, reg the performance improvements: I still think we need to address the performance aspect differently. As it stands now, the first call to decompress() will expand the value again to its original state (I would then have perhaps set value instead of introducing/duplicating decompressedValue) - which means it will use up again the original amount of memory (at which point the gain done by compression is over). The issue is that decompress() will be call pretty much immediately after a property was created - namely when it needs to be put into the cache and when its memory consumption is estimated. Hence there would be zero gain of compression if decompression was done as it stands now. (we could for example verify how the memory consumption calculation could be fixed - after all it's probably broken now with compression anyway - and perhaps that could make the compression state live longer)

pirlogea added 2 commits August 6, 2024 11:11
 separate DocumentPropertyState from CompressedDocumentPropertyState delete decompressedValue
 separate DocumentPropertyState from CompressedDocumentPropertyState refactor tests
Copy link
Contributor

@stefan-egli stefan-egli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to re-purpose this PR to now address OAK-10803? Then we might want to change the subject of the PR (that usually also becomes the commit message later).

 separate DocumentPropertyState from CompressedDocumentPropertyState refactor factory method
@ionutzpi
Copy link
Contributor Author

ionutzpi commented Aug 8, 2024

I separated construction of DocumentPropertyState from CompressedDocumentPropertyState through a factory method for step 1: first address those suggestions (bring memory consumption back to pre-compression)
We can change the subject

Copy link
Contributor

@stefan-egli stefan-egli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's missing is the wiring. With this change, the compression code is actually never triggered since it doesn't go through the factory yet.

@ionutzpi ionutzpi changed the title OAK-10973 - Performance tune property compression/decompression OAK-10803 - Fix memory consumption of uncompress properties Aug 8, 2024
 call DocumentPropertyState with factory method
@ionutzpi
Copy link
Contributor Author

ionutzpi commented Aug 8, 2024

Done. Call DocumentPropertyState through factory method.


public static PropertyState createPropertyState(DocumentNodeStore store, String name, String value, Compression compression) {

if (compression != null && !compression.equals(Compression.NONE) && value.length() > CompressedDocumentPropertyState.getCompressionThreshold()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the length check here would result to true for -1 - shouldn't there be a test for that, did that not fail for some reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are checking default threshold exceeded or not, it didnt failed because of passing Compression.NONE as argument

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a test that should fail now though. The condition is currently wrong.

Probably we should also use GZIP by default, otherwise the feature cannot be enabled at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. We can enable the feature by calling differently factory method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can enable the feature by calling differently factory method.

But that would require a code change. What I was referring to is enabling the feature by configuration - eg via the system property. We should have that - otherwise the code is not useable and disable in a hard-coded way. I do think we should use GZIP in the default, not NONE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test still doesn't fail consistently. I'm referring to DocumentPropertyStateFactoryTest.createPropertyStateWithDefaultCompression. That is one test at least that should fail with the current code. The reason it doesn't fail is that it depends on test execution order. Some other test method in that class is executed first (eg createPropertyStateWithCompressionThresholdNotExceeded) and that sets the threshold, so that createPropertyStateWithDefaultCompression then runs on false assumptions.

Things that require fixing:

  • createPropertyStateWithDefaultCompression should check that it actually uses the default - i.e. it should have an assertEquals before even executing the actual property creation (and that would currently fail, depending on test execution order)
  • in addition to such asserts in each method, there must also be a @After and a @Before that resets the threshold
  • plus the actual code fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@stefan-egli stefan-egli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't yet fully checked but it looks like some tests got lost in translation? eg multiValuedAboveThresholdSize. Also, compressValueThrowsException : now expects an exception while previously it didn't?

import org.slf4j.LoggerFactory;

/**
* PropertyState implementation with lazy parsing of the JSOP encoded value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would suggest to add some short mention of the fact that this class uses compression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

this(store, name, value, Compression.GZIP);
}

DocumentPropertyState(DocumentNodeStore store, String name, String value, Compression compression) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this class should be reverted to the state before compression? In doing that it looks like it missed a commit that came in between : b04de7e

Copy link
Contributor Author

@ionutzpi ionutzpi Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manually added that commit

@Rule
public DocumentMKBuilderProvider builderProvider = new DocumentMKBuilderProvider();

private Set<String> reads = newHashSet();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please now new use of Guava.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@stefan-egli
Copy link
Contributor

The constructor in CompressedDocumentPropertyState has changed from the previous version of DocumentPropertyState

What question or comment does this refer to?

@ionutzpi
Copy link
Contributor Author

Also, compressValueThrowsException : now expects an exception while previously it didn't?

This question

@stefan-egli
Copy link
Contributor

(Adding a general comment here in the main discussion as it otherwise might be getting a bit complicated by now)

What I was trying to say: the fact that the DocumentPropertyStateTest got refactored into 3 different classes lost a few aspects and thus results in a regression.

  • as mentioned I believe some test methods got lost (I will double-check and can add more details)
  • in addition though, some of the @Before and @After logic got lost - which actually is also a regression. For example, the broken surrogate tests no longer work as they were intended - they now inherit the threshold from whatever test/test-method was executed before, which is breaking change from the previous behavior (I think previously they tested with -1)
  • (and the -1 bug mentioned several times is thus still undetected by tests and is still existing in the code)

Plus what is also a regression vs the previous state is what I mentioned previously:

Also, compressValueThrowsException : now expects an exception while previously it didn't?

Previously the constructor was swallowing those exceptions, which I think is a good thing. Now that has been changed. Why?

@ionutzpi
Copy link
Contributor Author

Moved broken surrogate tests on CompressedDocumentPropertyStateTest(before and after are in place here).
In CompressDocumentPropertyState constructor the logic is only for compression, if it fails then throw new IllegalArgumentException.

@stefan-egli
Copy link
Contributor

In CompressDocumentPropertyState constructor the logic is only for compression, if it fails then throw new IllegalArgumentException.

That's a change to the previous behavior though. We previously discussed that it shouldn't throw an exception for compressing a property but just fall back to uncompressed. What is the reason for this change?

@stefan-egli
Copy link
Contributor

In CompressedDocumentPropertyState constructor if it throws an exception, it will be caught in factory method and call DocumentPropertyState constructor.

(Moving this back to main thread)

Could the test (compressValueThrowsException) then be adjusted to use the factory and thus avoid (expected = IllegalArgumentException.class) ?

@stefan-egli
Copy link
Contributor

On the test methods that seem to have gotten lost, here's a list of them - those existed prior to this PR and now seem gone (unless renamed) :

uncompressValueThrowsException
multiValuedAboveThresholdSize
stringAboveThresholdSizeNoCompression
testEqualsWithoutCompression
testInterestingStringsWithoutCompression
testOneCompressOtherUncompressInEquals
uncompressValueThrowsException

Also noticed that stringBelowThresholdSize actually tests below threshold - so either the test name or the test code is wrong.

Also, as a general comment : I think it would be useful to go via the factory method whenever possible. That would extend the test coverage of the factory method - which is a very key element. (I think this would be the lost prio comment though)

@ionutzpi
Copy link
Contributor Author

All the tests mention above were refactored to use new context. The test suite created covered all scenarios previous done.

@stefan-egli
Copy link
Contributor

stefan-egli commented Aug 13, 2024

All the tests mention above were refactored to use new context. The test suite created covered all scenarios previous done.

where is uncompressValueThrowsException covered? where testInterestingStringsWithoutCompression ?

edit: plus where is testOneCompressOtherUncompressInEquals covered?

@ionutzpi
Copy link
Contributor Author

ionutzpi commented Aug 13, 2024

Added required test methods in CompressedDocumentPropertyStateTest

Copy link
Contributor

@stefan-egli stefan-egli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, looks good now. Will let the test run finish (but I think 2 tests also fail in trunk, so those should anyway be ignored), and then (squash) merge depending on results.

@stefan-egli
Copy link
Contributor

ok, test failed, but the failures looks like unrelated flaky ones. going to merge anyway.

@stefan-egli stefan-egli merged commit f126a50 into apache:trunk Aug 13, 2024
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants