Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAK-10875 - unmerged bc on root can make it appear merged #1605

Merged
merged 12 commits into from
Aug 21, 2024

Conversation

ionutzpi
Copy link
Contributor

If an oak instance crashes during a branch commit, which in turns leaves the bc unmerged - and that branch contained changes on root - it can later result in that unmerged bc to be considered as merged. If that values is then cached before other changes of the branch are read, those other changes are then considered merged even though they never have.

Copy link
Contributor

@stefan-egli stefan-egli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fix should contain 2 main parts:

  1. put the repository back into a state where things a proper, in case of a OAK-10875 style "inconsistency". For that I think removing the _commitRoot could be a part of it. Added more details to alternatives in the commend below (we could also not remove neither _commitRoot nor _bc).

  2. as a second part though, we'd also have to handle cases where the inconsistency is already done and it's too late to clean up, as we've gone passed that moment (recovery was already done, unmerged bcs already removed). For this I think there could be two sub options:

2a. introduce a type of purgeUnmergedRevisions where we are able to clean up such an inconsistency even if it happened months ago (before the fix we want to introduce here), and thus the _bc is already gone. If this were possible, that would be nice, I think.

2b. make the "commit value resolving code" (which is spread over several classes) aware of this type of inconsistency and handle it properly. So that we wouldn't depend on a cleanup.

  1. I'm thinking FullGC might also have to be made capable - or could be made capable - of cleaning up this inconsistency. Perhaps as alternative to 2a.

@stefan-egli
Copy link
Contributor

Also it might be good to modify the original test to have 2 kinds of tests:

  • one that tests normally (basically the current test)
  • one that tests when the repository was brought into the corrupted state, without any fix applied yet. Worst case the repository state would need to be created "hard coded" - eg by going through the steps as 1, but then explicitly force the state where _bc is removed but _commitRoot is not. This would be used to simulate suggested part 2. above.

(perhaps we can come up with a test that ensures that a property created as unmerged bc on root is never treated as committed - even though it is passed sweep rev and no branch commit info is remaining)

@ionutzpi
Copy link
Contributor Author

For the first part I removed the unmerged revision from _commitRoot(I think this is the proper fix to solve the bug).
For the second part, I will create another ticket to restore the state of repository if it is needed.

@stefan-egli
Copy link
Contributor

What about splitting the two parts, as you suggested, into two different PRs? I would tend to prefer if OAK-10875 would stay open as I believe both parts are required? But they could certainly be split into two PRs to simplify things. Wdyt?

Btw: there's something odd going on with this PR as it seems to duplicate the original : eg unmergedBCOnRoot is already there in trunk, but this PR adds it again, same with the changes of FailingDocumentStore. The CI currently complains with

[ERROR] /home/jenkins/jenkins-agent/workspace/Jackrabbit_oak-trunk-pr_PR-1605/oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/BranchTest.java:[317,18] method invalidateCommitValueResolverCache(org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore) is already defined in class org.apache.jackrabbit.oak.plugins.document.BranchTest

I think originally the PR was complaining it can't merge due to these duplications - now it doesn't do that anymore. No idea why...

@ionutzpi
Copy link
Contributor Author

Yes. We can have 2 PR for this ticket. The first part can be reviewed. I added a test method for the case where we have unmergedBC but no property set on root in the same BC(it didn't failed).

@stefan-egli
Copy link
Contributor

looks like there's some remaining merge conflict

@ionutzpi
Copy link
Contributor Author

It is fine now.

Copy link
Contributor

@stefan-egli stefan-egli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have one suggestion for consideration, otherwise looking good.

@stefan-egli
Copy link
Contributor

waiting for tests to succeed (or only have unrelated flakyness) before approving/merging next

@stefan-egli
Copy link
Contributor

failure seems unrelated :

[ERROR] Process Exit Code: 143
[ERROR] Crashed tests:
[ERROR] org.apache.jackrabbit.oak.jcr.RepositoryTest
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/jenkins/workspace/Jackrabbit_oak-trunk-pr_PR-1605/oak-jcr && /usr/local/asfpackages/java/adoptium-jdk-11.0.16.1+1/bin/java -javaagent:/home/jenkins/maven-repositories/1/org/jacoco/org.jacoco.agent/0.8.12/org.jacoco.agent-0.8.12-runtime.jar=destfile=/home/jenkins/workspace/Jackrabbit_oak-trunk-pr_PR-1605/oak-jcr/target/coverage-reports/jacoco-ut.exec -Xmx512m -XX:+HeapDumpOnOutOfMemoryError -Dupdate.limit=100 -Djava.awt.headless=true -jar /home/jenkins/workspace/Jackrabbit_oak-trunk-pr_PR-1605/oak-jcr/target/surefire/surefirebooter9881064834527679463.jar /home/jenkins/workspace/Jackrabbit_oak-trunk-pr_PR-1605/oak-jcr/target/surefire 2024-08-21T14-23-22_802-jvmRun1 surefire1194720512709980725tmp surefire_011467686415861265474tmp

@stefan-egli stefan-egli merged commit 15396b2 into apache:trunk Aug 21, 2024
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants