Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure loading SolrCore at startup - java.util.NoSuchElementException: No key id for key ref=0 after server restart #109

Open
danielsason112 opened this issue Jul 16, 2024 · 3 comments
Assignees

Comments

@danielsason112
Copy link

Hi, tried using the encryption plugin.

steps to reproduce the issue:

  • Create a new collection with the encryption config
  • Invoke the /admin/encrypt API
  • Insert a new document
  • Stop server and start it again

Stacktrace:
SolrCore failed to load on startup => org.apache.solr.common.SolrException: Unable to create core [test_shard1_replica_n1] at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1750) org.apache.solr.common.SolrException: Unable to create core [test_shard1_replica_n1] at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1750) ~[?:?] at org.apache.solr.core.CoreContainer.lambda$loadInternal$12(CoreContainer.java:1057) ~[?:?] at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:212) ~[metrics-core-4.2.25.jar:4.2.25] at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?] at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:312) ~[?:?] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?] at java.base/java.lang.Thread.run(Thread.java:1583) [?:?] Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1229) ~[?:?] at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1057) ~[?:?] at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1717) ~[?:?] ... 8 more Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2445) ~[?:?] at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2578) ~[?:?] at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1272) ~[?:?] at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1177) ~[?:?] at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1057) ~[?:?] at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1717) ~[?:?] ... 8 more Caused by: java.util.NoSuchElementException: No key id for key ref=0 at org.apache.solr.encryption.EncryptionUtil.getKeyIdFromCommit(EncryptionUtil.java:124) ~[?:?] at org.apache.solr.encryption.EncryptionDirectory.getKeySecret(EncryptionDirectory.java:231) ~[?:?] at org.apache.solr.encryption.EncryptionDirectory.openInput(EncryptionDirectory.java:254) ~[?:?] at org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsReader.<init>(Lucene90BlockTreeTermsReader.java:134) ~[?:?] at org.apache.lucene.codecs.lucene99.Lucene99PostingsFormat.fieldsProducer(Lucene99PostingsFormat.java:428) ~[?:?] at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.<init>(PerFieldPostingsFormat.java:330) ~[?:?] at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:392) ~[?:?] at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:119) ~[?:?] at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:96) ~[?:?] at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:178) ~[?:?] at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:220) ~[?:?] at org.apache.lucene.index.IndexWriter.lambda$getReader$0(IndexWriter.java:542) ~[?:?] at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:138) ~[?:?] at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:604) ~[?:?] at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:112) ~[?:?] at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:91) ~[?:?] at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38) ~[?:?] at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2399) ~[?:?] at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2578) ~[?:?] at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1272) ~[?:?] at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1177) ~[?:?] at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1057) ~[?:?] at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1717) ~[?:?] ... 8 more

Stop server logs:
2024-07-15 23:14:11.124 INFO (JettyShutdownThread) [c: s: r: x: t:] o.e.j.s.Server Stopped Server@73d6d0c{STOPPING}[10.0.20,sto=0] 2024-07-15 23:14:11.129 INFO (JettyShutdownThread) [c: s: r: x: t:] o.e.j.s.AbstractConnector Stopped ServerConnector@38600b{HTTP/1.1, (http/1.1, h2c)}{127.0.0.1:8983} 2024-07-15 23:14:11.141 INFO (JettyShutdownThread) [c: s: r: x: t:] o.a.s.c.CoreContainer Shutting down CoreContainer instance=658563862 2024-07-15 23:14:11.141 INFO (JettyShutdownThread) [c: s: r: x: t:] o.a.s.c.ZkController Remove node as live in ZooKeeper:/live_nodes/localhost:8983_solr 2024-07-15 23:14:11.144 INFO (JettyShutdownThread) [c: s: r: x: t:] o.a.s.c.ZkController Publish this node as DOWN... 2024-07-15 23:14:11.144 INFO (JettyShutdownThread) [c: s: r: x: t:] o.a.s.c.ZkController Publish node=localhost:8983_solr as DOWN 2024-07-15 23:14:11.145 INFO (zkCallback-11-thread-3) [c: s: r: x: t:] o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> (0) 2024-07-15 23:14:11.153 INFO (coreCloseExecutor-36-thread-1) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1 t:] o.a.s.c.SolrCore CLOSING SolrCore org.apache.solr.core.SolrCore@2a0ca7d1 test_shard1_replica_n1 2024-07-15 23:14:11.153 INFO (coreCloseExecutor-36-thread-1) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1 t:] o.a.s.m.SolrMetricManager Closing metric reporters for registry=solr.core.test.shard1.replica_n1 tag=SolrCore@2a0ca7d1 2024-07-15 23:14:11.155 INFO (zkCallback-11-thread-3) [c: s: r: x: t:] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/test/state.json zxid: -1] for collection [test] has occurred - updating... (live nodes size: [0]) 2024-07-15 23:14:11.172 INFO (coreCloseExecutor-36-thread-1) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1 t:] o.a.s.m.SolrMetricManager Closing metric reporters for registry=solr.collection.test.shard1.leader tag=SolrCore@2a0ca7d1 2024-07-15 23:14:11.177 INFO (coreCloseExecutor-36-thread-1) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1 t:] o.a.s.u.DirectUpdateHandler2 Committing on IndexWriter.close() .

Looks like the DirectUpdateHandler2's closeWriter method is overwriting commit data, containing the encryption key ref.
Is there a better approach than overriding closeWriter in EncryptionUpdateHandler and merging the existing commit data?

Appreciate suggestions and help on this one. Thanks.

@bruno-roustant
Copy link
Contributor

@danielsason112 do you commit before closing the server?
Indeed, I may have to look at this closeWriter method to override it with EncryptionUpdateHandler.

@bruno-roustant
Copy link
Contributor

I created a PR to show how to fix, with a test. But the real fix requires a change in the Solr project upstream. I'm going to do it quickly, but we'll have to wait for the next Solr release to have the proper fix available here.

Currently the workaround is to commit before shutting down the server. The problem is with the auto-commit when the IndexWriter is closed.

@bruno-roustant bruno-roustant self-assigned this Oct 18, 2024
@dsmiley
Copy link

dsmiley commented Oct 18, 2024

I suppose IndexWriter.commit() is risky for index encryption... we must always guarantee there is commit metadata? This seems fragile; hard to guarantee, and with bad consequences.

I did find-usages and I see two other callers, and would like your opinion:

  • CoreContainer.reload
  • SolrIndexSplitter.doSplit

But finding all callers and adding protections, again, feels fragile. Like if we do index encryption, maybe we need a stronger lower level guarantee that the metadata is there. I haven't thought about this too much other than observing the fragility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants