Fix IndexReader ref leak when Lucene index modified and read in transaction #10261
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does
This PR fixes an
IndexReader
ref leak in the ref counting logic of Lucene index searching that leads to:IndexReader
ref counts leaking (i.e. not being decremented), which in turn leads toIndexFileDeleter
remaining in memory, creating a memory leak (in the order of 100MB heap per million files)The leak occurs when:
What we observed
Under a sustained workload (reading from a queue) that performed Lucene index based queries and also updated Lucene indexes in the same transaction, we observed:
When OrientDB was restarted, the first transaction to modify the Lucene index caused all of the unused index files to be deleted. Additionally, waiting 2 minutes for the IndexWriter to time out, and then performing a Lucene index query (causing the IndexWriter to reopen) has similar effects.
Heap dumps of the running system showed millions of files tracked in the
IndexFileDeleter
(using 100s of MB of heap), many with reference counts in the 100s of thousands, and many instances ofIndexReader
with ref counts in the thousands.Analysis
When an
IndexSearcher
is constructed in a transaction containing pending Lucene index changes, aMultiReader
is constructed across the FS basedIndexReader
(usually aSegmentReader
and the in-memoryIndexReader
. If no pending changes have been made, then the underlying FSIndexReader
is returned directly.When the
IndexSearcher
wrapping this IndexReader is finished with inOLuceneResultSet
(after exhaustion of the query iteration), theIndexReader
underlying the searcher was being conditionally released.Previously there was a guard on the release that did not release if the ref count was > 1. Aside from this being inherently suspect in a multi-threaded situation (it's checking an atomic integer), it essentially meant that the release was never called for the
MultiReader
case (since this reader always started life and got used with a ref count of 1).MultiReader
has 2 modes it can be used in:IndexReaders
where theMultiReader
should take ownership of the underlying readers and close them when its own ref count hits 0IndexReaders
obtained from Lucene `ReferenceManagers` that are having their ref counts incremented/decremented externally to the `MultiReader`.It looks like this refcount > 1 guard was a potentially mistaken attempt at avoiding
MultiReader
closing the underlyingIndexReader
, since it was being opened in close readers mode.Unfortunately this meant that the FS
IndexReader
was never released when it was wrapped in aMultiReader
, resulting in the file/memory leaks we observed.Resolution
In this situation, we want the decrement of the
IndexReader
used in the search to always occur, regardless of whether it's aMultiReader
or the base FSIndexReader
.To achieve this, we need to make sure the
MultiReader
has "ownership" of theIndexReaders
we pass to it, and that it is switched into the reference counting/not closing mode. Passing ownership is achieved by decrementing the ref count of the wrappedIndexReaders
afterMultiReader
has incremented them, negating the increment created on allocation, and ensuringMultiReader
will decrement and potentially close the underlying readers appropriately.With this change, the result set can simply release the
IndexReader
on completion of iteration, without needing to know what kind of reader it is.We've tested this on 100s of thousands of transactions, and observed relatively small and constant numbers of index files throughout the same test scenarios, demonstrating that merging and deleting is being performed as expected.
We're still operating 3.1, but this patch applies cleanly to 3.2 as well.