Skip to content

performance tuning for multithreaded solr ingest (Deprecated not recommended)

lutaylor edited this page Feb 1, 2019 · 1 revision

Fedora Gsearch Multithreading

$FEDORA_HOME/server/config/fedora.fcfg

<datastore id="apimUpdateMessages">
    <comment>Messaging Destination for API-M events which update the repository</comment>
    <param name="messageTypes" value="apimUpdate">
      <comment>A space-separated list of message types that will be
            delivered to this Destination. Currently, &quot;apimUpdate&quot; and
            &quot;apimAccess&quot; are the only supported message types.</comment>
    </param>
    <param name="name" value="fedora.apim.update"/>
    <param name="type" value="topic">
      <comment>Optional, defaults to topic.</comment>
    </param>
  </datastore>
  <datastore id="apimAccessMessages">
    <comment>Messaging Destination for API-M events which did not make changes to the repository</comment>
    <param name="messageTypes" value="apimAccess">
      <comment>A space-separated list of message types that will be
            delivered to this Destination. Currently, &quot;apimUpdate&quot; and
            &quot;apimAccess&quot; are the only supported message types.</comment>
    </param>
    <param name="name" value="fedora.apim.access"/>
    <param name="type" value="topic">
      <comment>Optional, defaults to topic.</comment>
    </param>
  </datastore>

Change To

  <datastore id="apimUpdateMessages">
    <comment>Messaging Destination for API-M events which update the repository</comment>
    <param name="messageTypes" value="apimUpdate">
      <comment>A space-separated list of message types that will be
            delivered to this Destination. Currently, &quot;apimUpdate&quot; and
            &quot;apimAccess&quot; are the only supported message types.</comment>
    </param>
    <param name="name" value="fedora.apim.update"/>
    <param name="type" value="queue">
      <comment>Optional, defaults to topic.</comment>
    </param>
  </datastore>
  <datastore id="apimAccessMessages">
    <comment>Messaging Destination for API-M events which did not make changes to the repository</comment>
    <param name="messageTypes" value="apimAccess">
      <comment>A space-separated list of message types that will be
            delivered to this Destination. Currently, &quot;apimUpdate&quot; and
            &quot;apimAccess&quot; are the only supported message types.</comment>
    </param>
    <param name="name" value="fedora.apim.access"/>
    <param name="type" value="topic">
      <comment>Optional, defaults to topic.</comment>
    </param>
  </datastore>

Light Mulgara Tuning


The bufferSafeCapacity under the localMulgaraTriplestore can also be increased to 80,000 or so depending on the hardware it's under. bufferFlushBatchSize and autoFlushBufferSize can also be comfortable doubled.

Mutithreading Gsearch

======= go to $CATALINA_HOME/webapps/fedoragsearch/WEB-INF/classes/fgsconfigFinal/updater edit FgsUpdaters/updater.properties change: topic.fedoraAPIM = fedora.apim.update to queue.fedoraAPIM = fedora.apim.update

cp -rpf FgsUpdaters FgsUpdater1
cp -rpf FgsUpdaters FgsUpdater2
cp -rpf FgsUpdaters FgsUpdater3
cp -rpf FgsUpdaters FgsUpdater4

open FgsUpdater1-4 and edit updaters.properties, client.id to fedoragsearch1-4 so they all have unique ID's. the id's are not important they just must be unique

open

$CATALINA_HOME/webapps/fedoragsearch/WEB-INF/classes/fgsconfigFinal/fedoragsearch.properties change: fedoragsearch.updaterNames = FgsUpdaters

to:

fedoragsearch.updaterNames = FgsUpdaters FgsUpdater1 FgsUpdater2 FgsUpdater3 FgsUpdater4

You can add/remove updaters from this line depending on how many threads you want the indexer to run. The default solr buffering/merge paramaters are generally fine. But can be tweaked if you noticing performance starts out well then starts to drop with lots of disk thrashing.