Overview

mod-search is based on metadata-driven approach. It means that resource description is specified using JSON file and all rules, mappings and other thing will be applied by internal mod-search services.

Supported search field types

Elasticsearch mapping field types: field data types The field type is used to define what search capabilities the corresponding field can provide. For example, keyword field type is used for term queries and aggregations (providing facets for the record). The text fields are intended to use by the full-text queries.

Resource description

Property name	Description
name	The resource name, it used for searching by resource to determine index name, creating index settings and mappings
parent	The parent resource name (currently, it is used for browsing by subjects when the additional index is added to arrange the instance subjects uniquely)
eventBodyJavaClass	The Java class that incoming JSON can be mapped to. Currently, it's used to make processing of search field more convenient
languageSourcePaths	Contains a list of json path expressions to extract languages values in ISO-639 format. If the multi-language is supported for the resource, this path must be specified.
searchFieldModifiers	Contains a list of field modifiers, which pre-processes incoming fields for elasticsearch request.
fields	List of field descriptions to extract values from incoming resource event
fieldTypes	List of resource descriptions that can be used with an alias using `$ref` field of `PlainFieldDescription`. It's done to reduce duplication in resource description.
searchFields	Contains a list of generated fields for the resource events (for example, It can be contain ISBN normalized values or generating subset of field values).
indexMappings	Object with additional index mappings for resource (It can be helpful for `copy_to` functionality of Elasticsearch
mappingSource	It's used to include or exclude some field from storing those values in `_source` object in Elasticsearch. Mainly, it's used to reduce the size per index. See also: _source field
reindexSupported	Indicates if the resource could be reindexed

Supported field description types

Field type	Description
plain	This field type is default and there is no need to explicitly specify the field. It can be used to define all fields containing the following values: string, number, boolean, or array of plain values.
object	This field type is used to mark that key contains subfield, each of subfield must have its own field description.
authority	This field type is designed to provide special options to divide a single authority record into multiple based on the `distinctType` property value.

Plain field description

Property name	Description
searchTypes	List of search types that are supported for the current field. Allowed values: `facet`, `filter`, `sort`
searchAliases	List of aliases that can be used as a field name in the CQL search query. It can be used to combine several fields together during the search. For example, a query `keyword all title` combines for instance record following fields - `title`, `alternativeTitles.alternativeTitle`, `indexTitle`, `identifiers.value`, `contributors.name` Other way of using it - is to rename field keeping the backward compatibility without required reindex.
index	Reference to the Elasticsearch mappings that are specified in index-field-types
showInResponse	Marks field to be returned during the search operation. `mod-search` adds to the Elasticsearch query all marked field paths. See also: Source filtering
searchTermProcessor	Search term processor, which pre-processes incoming value from CQL query for the search request.
mappings	Elasticsearch fields mappings. It can contain new field mapping or can enrich referenced mappings, that comes from `index-field-types`
defaultValue	The default value for the plain field
indexPlainValue	Specifies if plain keyword value should be indexed with field or not. Works only for full-text fields. See also: Full-text plain fields
sortDescription	Provides sort description for field. If not specified - standard rules will be applied for the sort field. See also: Sorting by fields

Object field description

Property name	Description
properties	Map where key - is the subfield name, value - is the field description

Authority field description

Property name	Description
distinctType	Distinct type to split single entity to multiple containing only common fields excluding all other fields marked with other distinct types
headingType	Heading type that should be set to the resource if a field containing some values.
authRefType	Authorized, Reference, or Auth/Ref type for divided authority record.

Creating Elasticsearch mappings

Elasticsearch mappings are created using field descriptions. All fields, that are specified in the record description will be added to the index mappings, and they will be used to prepare the Elasticsearch document.

By default, mappings are taken from index-field-types. It's the common file containing pre-defined mapping values that can be accessed by reference from index field of PlainFieldDescription. The field mappings for specific field can be enriched using mapping field. Also, the ResourceDescription contains section indexMappings which provides for developers to add custom mappings without specifying them in the index-field-types.json file.

For example, the resource description contains the following field description:

{
  "fields": {
    "f1": {
      "index": "keyword",
      "mappings": {
        "copy_to": [ "sort_f1" ]
      }
    },
    "f2": {
      "index": "keyword"
    }
  },
  "indexMappings": {
    "sort_f1": {
      "type": "keyword",
      "normalizer": "keyword_lowercase"
    }
  }
}

Then the mappings' helper will create the following mappings object:

{
  "properties": {
    "f1": {
      "type": "keyword",
      "copy_to": [ "sort_f1" ]
    },
    "f2": {
      "type": "keyword"
    },
    "sort_f1": {
      "type": "keyword",
      "normalizer": "keyword_lowercase"
    }
  }
}

Adding mod-search specific kafka topics

In order to make mod-search create his own topic for kafka, it should be added to application.yml file with application.kafka.topics path.

Topic parameters:

Property name	Description
name	Topic base name that will be concatenated with environment name and tenant name.
numPartitions	Break a topic into multiple partitions. Can be left blank in order to use default '-1' value.
replicationFactor	Specify how much replicas do you need for a topic. Can be left blank in order to use default '-1' value.

Example

application:
  kafka:
    topics:
      - name: search.instance-contributor
        numPartitions: ${KAFKA_CONTIBUTORS_TOPIC_PARTITIONS:50}
        replicationFactor: ${KAFKA_CONTIBUTORS_TOPIC_REPLICATION_FACTOR:}

Full-text fields

Currently, supported 2 field types for full-text search:

multi-language field values (see also: Language Analyzers)
standard tokenized field values (see also: Standard Tokenizer)

Also, to support the wildcard search by the whole phrase the plain values are added to the generated document. For example, multi-language analyzed field with indexPlainValue = true (default):

Source record:

{
  "title": "Semantic web primer",
  "language": "eng"
}

Result document:

{
  "title": {
    "eng": "Semantic web primer",
    "src": "Semantic web primer"
  },
  "plain_title": "Semantic web primer"
}

Example of document with field with index = standard:

Source:

{
  "contributors": [
    {
      "name": "A contributor name",
      "primary": true
    }
  ]
}

Result document:

{
  "contributors": [
    {
      "name": "A contributor name",
      "plain_name": "A contributor name",
      "primary": true
    }
  ]
}

Field Sorting

All fields marked with searchType = sort must be available for sorting. To sort by text values following field indices can be applicable:

keyword (case-sensitive)
keyword_lowercase (case-insensitive)

Sort Description

Property name	Description
fieldName	Custom field name, if it is not specified - default strategy will be applied: `sort_${fieldName}`.
sortType	Sort field type: `single` or `collection`
secondarySort	List of fields that must be added as secondary sorting (eg, sorting by `itemStatus` and instance `title` fields)

By default, if the field is only marked with searchType = sort - the mod-search will generate the following sort condition:

{
  "sort": [
    {
      "name": "sort_$field",
      "order": "${value comes from cql query: asc/desc}"
    }
  ]
}

if sortDescription contains sortTYpe as collection the following rules will be applied:

if sortOrder is asc then the mode will be equal to min. It means that for sorting by a field containing a list of values - the lowest value will be picked for sorting.
if sortOrder is desc the the mode will be equal to max. It means that for sorting by a field containing a list of values - the highest value will be picked for sorting.

Testings

Unit testing

The project uses mostly only one framework for assertions - AssertJ A few examples:

assertThat(actualQuery).isEqualTo(matchAllQuery());

assertThat(actualCollection).isNotEmpty().containsExactly("str1", "str2");

assertThatThrownBy(() -> service.doExceptionalOperation())
  .isInstanceOf(IllegalArgumentException.class)
  .hasMessage("invalid parameter");

Integration testing

The module uses Testcontainers to run Elasticsearch, Apache Kafka and PostgreSQL in embedded mode. It is required to have Docker installed and available on the host where the tests are executed.

Local environment testing

Run docker-compose up in a project root folder. This will build local mod-search image and bring it up along with all necessary infrastructure:

elasticsearch along with dashboards (kibana analogue from opensearch)
kafka along with zookeeper
postgres
wiremock server for mocking external api calls (for example authorization)

Also you should invoke

curl --location --request POST 'http://localhost:8081/_/tenant' \
--header 'Content-Type: application/json' \
--header 'x-okapi-tenant: test' \
--header 'x-okapi-url: http://api-mock:8080' \
--data-raw '{
  "purge": "false"
}

to post some tenant in order to bring up kafka listeners and get indices created.

To rebuild mod-search image you should:

bring down existing containers by running docker-compose down
run docker-compose build mod-search to build new mod-search image
run docker-compose up to bring up infrastructure

Hosts/ports of containers to access functionality:

http://localhost:5601/ - dashboards UI for elastic monitoring, data modification through dev console
localhost - host, 5010 - port for remote JVM debug
http://localhost:8081 - for calling mod-search REST api. Note that header x-okapi-url: http://api-mock:8080 should be added to request for apis that take okapi url from headers
localhost:29092 - for kafka interaction. If you are sending messages to kafka from java application with spring-kafka then this host shoulb be added to spring.kafka.bootstrap-servers property of application.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

development.md

development.md

Overview

Supported search field types

Resource description

Supported field description types

Plain field description

Object field description

Authority field description

Creating Elasticsearch mappings

Adding mod-search specific kafka topics

Topic parameters:

Example

Full-text fields

Field Sorting

Sort Description

Testings

Unit testing

Integration testing

Local environment testing

Files

development.md

Latest commit

History

development.md

File metadata and controls

Overview

Supported search field types

Resource description

Supported field description types

Plain field description

Object field description

Authority field description

Creating Elasticsearch mappings

Adding mod-search specific kafka topics

Topic parameters:

Example

Full-text fields

Field Sorting

Sort Description

Testings

Unit testing

Integration testing

Local environment testing