OAK-10803 - Fix memory consumption of uncompress properties #1619

ionutzpi · 2024-08-01T12:03:58Z

In OAK-10803 we introduce the possibility of compressing property values. During its development we encountered performance issues, namely with repeated decompression calls, even just during getNodeAtRevision. Before we can confidentially enable and use compression, we need to revisit performance first. Likely we'll have to make further tweaks before we can use it.

reschke

This saves space, but instead will make every read access slower, right?

ionutzpi · 2024-08-01T12:48:10Z

Done. Refactored it to make read access faster.

stefan-egli · 2024-08-01T12:55:50Z

I don't see the value in this. This will restrict usability of compression to the time between creation and first access of the value. Thereafter it uses more space.

ionutzpi · 2024-08-01T13:13:30Z

Revert it to first commit. Yes, the read access is slower.

stefan-egli · 2024-08-01T13:19:04Z

decompressedValue is never set?

…ecompressedValue

ionutzpi · 2024-08-01T13:55:26Z

Done.

…ecompressedValue

…ate DocumentPropertyState from CompressedDocumentPropertyState

stefan-egli · 2024-08-05T14:50:58Z

IIUC this PR has now addressed the suggestion on OAK-10803. That I think we should follow-up with indeed.

What about doing it in 2 steps though:

first address those suggestions (bring memory consumption back to pre-compression)
then look into performance improvements

Currently this PR seems to mix both concerns, which makes review discussion a bit more complex.

Having said that, reg the performance improvements: I still think we need to address the performance aspect differently. As it stands now, the first call to decompress() will expand the value again to its original state (I would then have perhaps set value instead of introducing/duplicating decompressedValue) - which means it will use up again the original amount of memory (at which point the gain done by compression is over). The issue is that decompress() will be call pretty much immediately after a property was created - namely when it needs to be put into the cache and when its memory consumption is estimated. Hence there would be zero gain of compression if decompression was done as it stands now. (we could for example verify how the memory consumption calculation could be fixed - after all it's probably broken now with compression anyway - and perhaps that could make the compression state live longer)

separate DocumentPropertyState from CompressedDocumentPropertyState delete decompressedValue

separate DocumentPropertyState from CompressedDocumentPropertyState refactor tests

stefan-egli

Is the plan to re-purpose this PR to now address OAK-10803? Then we might want to change the subject of the PR (that usually also becomes the commit message later).

...t/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentPropertyStateFactory.java

...rc/main/java/org/apache/jackrabbit/oak/plugins/document/CompressedDocumentPropertyState.java

separate DocumentPropertyState from CompressedDocumentPropertyState refactor factory method

ionutzpi · 2024-08-08T09:40:11Z

I separated construction of DocumentPropertyState from CompressedDocumentPropertyState through a factory method for step 1: first address those suggestions (bring memory consumption back to pre-compression)
We can change the subject

stefan-egli

What's missing is the wiring. With this change, the compression code is actually never triggered since it doesn't go through the factory yet.

call DocumentPropertyState with factory method

ionutzpi · 2024-08-08T12:03:01Z

Done. Call DocumentPropertyState through factory method.

stefan-egli · 2024-08-08T12:47:06Z

...t/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentPropertyStateFactory.java

+
+ public static PropertyState createPropertyState(DocumentNodeStore store, String name, String value, Compression compression) {
+
+ if (compression != null && !compression.equals(Compression.NONE) && value.length() > CompressedDocumentPropertyState.getCompressionThreshold()) {


the length check here would result to true for -1 - shouldn't there be a test for that, did that not fail for some reason?

The tests are checking default threshold exceeded or not, it didnt failed because of passing Compression.NONE as argument

There should be a test that should fail now though. The condition is currently wrong.

Probably we should also use GZIP by default, otherwise the feature cannot be enabled at all?

Done. We can enable the feature by calling differently factory method.

We can enable the feature by calling differently factory method.

But that would require a code change. What I was referring to is enabling the feature by configuration - eg via the system property. We should have that - otherwise the code is not useable and disable in a hard-coded way. I do think we should use GZIP in the default, not NONE.

The test still doesn't fail consistently. I'm referring to DocumentPropertyStateFactoryTest.createPropertyStateWithDefaultCompression. That is one test at least that should fail with the current code. The reason it doesn't fail is that it depends on test execution order. Some other test method in that class is executed first (eg createPropertyStateWithCompressionThresholdNotExceeded) and that sets the threshold, so that createPropertyStateWithDefaultCompression then runs on false assumptions.

Things that require fixing:

createPropertyStateWithDefaultCompression should check that it actually uses the default - i.e. it should have an assertEquals before even executing the actual property creation (and that would currently fail, depending on test execution order)

in addition to such asserts in each method, there must also be a @After and a @Before that resets the threshold

plus the actual code fix

stefan-egli

Haven't yet fully checked but it looks like some tests got lost in translation? eg multiValuedAboveThresholdSize. Also, compressValueThrowsException : now expects an exception while previously it didn't?

stefan-egli · 2024-08-12T10:18:25Z

...rc/main/java/org/apache/jackrabbit/oak/plugins/document/CompressedDocumentPropertyState.java

+import org.slf4j.LoggerFactory;
+
+/**
+ * PropertyState implementation with lazy parsing of the JSOP encoded value.


Would suggest to add some short mention of the fact that this class uses compression.

...rc/main/java/org/apache/jackrabbit/oak/plugins/document/CompressedDocumentPropertyState.java

stefan-egli · 2024-08-12T10:22:52Z

...document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentPropertyState.java

- this(store, name, value, Compression.GZIP);
- }
-
- DocumentPropertyState(DocumentNodeStore store, String name, String value, Compression compression) {


Presumably this class should be reverted to the state before compression? In doing that it looks like it missed a commit that came in between : b04de7e

Manually added that commit

reschke · 2024-08-12T10:43:13Z

...est/java/org/apache/jackrabbit/oak/plugins/document/CompressedDocumentPropertyStateTest.java

+ @Rule
+ public DocumentMKBuilderProvider builderProvider = new DocumentMKBuilderProvider();
+
+ private Set<String> reads = newHashSet();


please now new use of Guava.

stefan-egli · 2024-08-12T13:52:04Z

The constructor in CompressedDocumentPropertyState has changed from the previous version of DocumentPropertyState

What question or comment does this refer to?

ionutzpi · 2024-08-12T14:07:59Z

Also, compressValueThrowsException : now expects an exception while previously it didn't?

This question

stefan-egli · 2024-08-12T14:10:57Z

(Adding a general comment here in the main discussion as it otherwise might be getting a bit complicated by now)

What I was trying to say: the fact that the DocumentPropertyStateTest got refactored into 3 different classes lost a few aspects and thus results in a regression.

as mentioned I believe some test methods got lost (I will double-check and can add more details)
in addition though, some of the @Before and @After logic got lost - which actually is also a regression. For example, the broken surrogate tests no longer work as they were intended - they now inherit the threshold from whatever test/test-method was executed before, which is breaking change from the previous behavior (I think previously they tested with -1)
(and the -1 bug mentioned several times is thus still undetected by tests and is still existing in the code)

Plus what is also a regression vs the previous state is what I mentioned previously:

Also, compressValueThrowsException : now expects an exception while previously it didn't?

Previously the constructor was swallowing those exceptions, which I think is a good thing. Now that has been changed. Why?

… to CompressedDocumentPropertyStateTest

ionutzpi · 2024-08-12T14:43:44Z

Moved broken surrogate tests on CompressedDocumentPropertyStateTest(before and after are in place here).
In CompressDocumentPropertyState constructor the logic is only for compression, if it fails then throw new IllegalArgumentException.

stefan-egli · 2024-08-12T15:13:55Z

In CompressDocumentPropertyState constructor the logic is only for compression, if it fails then throw new IllegalArgumentException.

That's a change to the previous behavior though. We previously discussed that it shouldn't throw an exception for compressing a property but just fall back to uncompressed. What is the reason for this change?

...ment/src/test/java/org/apache/jackrabbit/oak/plugins/document/DocumentPropertyStateTest.java

stefan-egli · 2024-08-13T08:13:07Z

In CompressedDocumentPropertyState constructor if it throws an exception, it will be caught in factory method and call DocumentPropertyState constructor.

(Moving this back to main thread)

Could the test (compressValueThrowsException) then be adjusted to use the factory and thus avoid (expected = IllegalArgumentException.class) ?

stefan-egli · 2024-08-13T08:43:17Z

On the test methods that seem to have gotten lost, here's a list of them - those existed prior to this PR and now seem gone (unless renamed) :

uncompressValueThrowsException
multiValuedAboveThresholdSize
stringAboveThresholdSizeNoCompression
testEqualsWithoutCompression
testInterestingStringsWithoutCompression
testOneCompressOtherUncompressInEquals
uncompressValueThrowsException

Also noticed that stringBelowThresholdSize actually tests below threshold - so either the test name or the test code is wrong.

Also, as a general comment : I think it would be useful to go via the factory method whenever possible. That would extend the test coverage of the factory method - which is a very key element. (I think this would be the lost prio comment though)

...document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentPropertyState.java

ionutzpi · 2024-08-13T09:04:24Z

All the tests mention above were refactored to use new context. The test suite created covered all scenarios previous done.

...rc/main/java/org/apache/jackrabbit/oak/plugins/document/CompressedDocumentPropertyState.java

stefan-egli · 2024-08-13T09:11:15Z

All the tests mention above were refactored to use new context. The test suite created covered all scenarios previous done.

where is uncompressValueThrowsException covered? where testInterestingStringsWithoutCompression ?

edit: plus where is testOneCompressOtherUncompressInEquals covered?

…pertyState

…ateTest

ionutzpi · 2024-08-13T10:29:10Z

Added required test methods in CompressedDocumentPropertyStateTest

...rc/main/java/org/apache/jackrabbit/oak/plugins/document/CompressedDocumentPropertyState.java

stefan-egli

+1, looks good now. Will let the test run finish (but I think 2 tests also fail in trunk, so those should anyway be ignored), and then (squash) merge depending on results.

stefan-egli · 2024-08-13T13:53:57Z

ok, test failed, but the failures looks like unrelated flaky ones. going to merge anyway.

OAK-10973 - Performance tune property compression/decompression

81bbfe5

reschke reviewed Aug 1, 2024

View reviewed changes

OAK-10973 - Performance tune property compression/decompression

c9e2678

OAK-10973 - Performance tune property compression/decompression revert

a7e0def

OAK-10973 - Performance tune property compression/decompression set d…

5567b28

…ecompressedValue

pirlogea added 2 commits August 5, 2024 10:18

OAK-10973 - Performance tune property compression/decompression set d…

fb07369

…ecompressedValue

OAK-10973 - Performance tune property compression/decompression separ…

5c8cc87

…ate DocumentPropertyState from CompressedDocumentPropertyState

pirlogea added 2 commits August 6, 2024 11:11

OAK-10973

537278b

separate DocumentPropertyState from CompressedDocumentPropertyState delete decompressedValue

OAK-10973

497715b

separate DocumentPropertyState from CompressedDocumentPropertyState refactor tests

stefan-egli reviewed Aug 7, 2024

View reviewed changes

OAK-10973

d911237

separate DocumentPropertyState from CompressedDocumentPropertyState refactor factory method

stefan-egli reviewed Aug 8, 2024

View reviewed changes

ionutzpi changed the title ~~OAK-10973 - Performance tune property compression/decompression~~ OAK-10803 - Fix memory consumption of uncompress properties Aug 8, 2024

OAK-10973

b41a7bc

call DocumentPropertyState with factory method

stefan-egli reviewed Aug 8, 2024

View reviewed changes

pirlogea added 3 commits August 8, 2024 16:15

OAK-10973 -- added tests for DocumentPropertyStateFactoryTest

cfc5c3c

OAK-10973 -- added tests for DocumentPropertyStateFactoryTest

e77cbb7

OAK-10973 -- added another test method

6546b1d

stefan-egli reviewed Aug 12, 2024

View reviewed changes

reschke reviewed Aug 12, 2024

View reviewed changes

pirlogea added 2 commits August 12, 2024 13:52

OAK-10973 -- set default GZIP for compression

8300cd3

OAK-10973 -- refactor

bcbc4bd

OAK-10973 -- refactor guava collection and move test broken surrogate…

32745a9

… to CompressedDocumentPropertyStateTest

stefan-egli reviewed Aug 12, 2024

View reviewed changes

...ment/src/test/java/org/apache/jackrabbit/oak/plugins/document/DocumentPropertyStateTest.java Show resolved Hide resolved

pirlogea added 2 commits August 13, 2024 09:17

OAK-10973 -- added -1 condition into factory method

c85c79e

OAK-10973 -- added test for -1

1980813

stefan-egli reviewed Aug 13, 2024

View reviewed changes

...document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentPropertyState.java Outdated Show resolved Hide resolved

stefan-egli reviewed Aug 13, 2024

View reviewed changes

...rc/main/java/org/apache/jackrabbit/oak/plugins/document/CompressedDocumentPropertyState.java Show resolved Hide resolved

OAK-10973 -- remove guava from DocumentPropertyState

43960b7

pirlogea added 2 commits August 13, 2024 12:47

OAK-10973 -- remove duplicate static methods in CompressedDocumentPro…

d3621a2

…pertyState

OAK-10973 -- added required test methods CompressedDocumentPropertySt…

102db74

…ateTest

pirlogea added 5 commits August 13, 2024 14:32

OAK-10973 -- refactor if statement

ff16bea

OAK-10973 -- refactor if statement

933a2fe

OAK-10973 -- refactor test methods

17ea660

OAK-10973 -- refactor test methods

d5711fd

OAK-10973 -- refactor test methods

f5484e2

stefan-egli reviewed Aug 13, 2024

View reviewed changes

...rc/main/java/org/apache/jackrabbit/oak/plugins/document/CompressedDocumentPropertyState.java Outdated Show resolved Hide resolved

OAK-10973 -- remove unused imports

8d14734

stefan-egli approved these changes Aug 13, 2024

View reviewed changes

rishabhdaim approved these changes Aug 13, 2024

View reviewed changes

stefan-egli merged commit f126a50 into apache:trunk Aug 13, 2024
1 of 2 checks passed


		public static PropertyState createPropertyState(DocumentNodeStore store, String name, String value, Compression compression) {

		if (compression != null && !compression.equals(Compression.NONE) && value.length() > CompressedDocumentPropertyState.getCompressionThreshold()) {

OAK-10803 - Fix memory consumption of uncompress properties #1619

OAK-10803 - Fix memory consumption of uncompress properties #1619

Conversation

ionutzpi commented Aug 1, 2024

reschke left a comment

Choose a reason for hiding this comment

ionutzpi commented Aug 1, 2024 • edited Loading

stefan-egli commented Aug 1, 2024

ionutzpi commented Aug 1, 2024

stefan-egli commented Aug 1, 2024

ionutzpi commented Aug 1, 2024

stefan-egli commented Aug 5, 2024

stefan-egli left a comment

Choose a reason for hiding this comment

ionutzpi commented Aug 8, 2024 • edited Loading

stefan-egli left a comment

Choose a reason for hiding this comment

ionutzpi commented Aug 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stefan-egli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ionutzpi Aug 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stefan-egli commented Aug 12, 2024

ionutzpi commented Aug 12, 2024

stefan-egli commented Aug 12, 2024

ionutzpi commented Aug 12, 2024

stefan-egli commented Aug 12, 2024

stefan-egli commented Aug 13, 2024

stefan-egli commented Aug 13, 2024

ionutzpi commented Aug 13, 2024

stefan-egli commented Aug 13, 2024 • edited Loading

ionutzpi commented Aug 13, 2024 • edited Loading

stefan-egli left a comment

Choose a reason for hiding this comment

stefan-egli commented Aug 13, 2024

ionutzpi commented Aug 1, 2024 •

edited

Loading

ionutzpi commented Aug 8, 2024 •

edited

Loading

ionutzpi Aug 12, 2024 •

edited

Loading

stefan-egli commented Aug 13, 2024 •

edited

Loading

ionutzpi commented Aug 13, 2024 •

edited

Loading