Bundestag Speech Search

Simple search engine for querying speeches from the Bundestag.

Get started

$ git clone https://github.com/htw-projekt-p2p-volltextsuche/fulltext-search
$ cd fulltext-search

$ sbt run

Configure the App

The application configuration can be specified in src/main/resources/application.conf in HOCON notation. It's also possible to override the application.conf by java system properties or environment variables.

Environment variables need to be prefixed by CONFIG_FORCE_ except there is an env-alias specified for the property.

They will be evaluated in following order (starting from the highest priority):

Environment variable: export CONFIG_FORCE_SERVER_HOST="0.0.0.0"
- This actually doesn't work! - use env-alias if defined
Java system property as argument: sbt '; set javaOptions += "-Dserver.host="0.0.0.0""; run'
application.conf: server { host = 0.0.0.0 }

Configuration properties

identifier	description	env-alias	default
server.port	HTTP port of the service	HTTP_PORT	8421
server.host	Host of the service	SERVER_HOST	0.0.0.0
server.log-body	Enables server logging of response bodies	SERVER_LOG_BODY	true
index.storage	Storage policy for the inverted index	INDEX_STORAGE_POLICY	local
index.stop-words-location	File name of the stopwords resource	-	stopwords_de.txt
index.sample-speeches-location	File name of the sample speeches resource	-	sample_speeches.json
index.insert-sample-speeches	Inserts sample speeches on startup when set	-	false
index.distribution-interval	Interval for scanning and distributing the cached index in ms	INDEX_DISTRIBUTION_INTERVAL	120000
index.distribution-chunk-size	Size of each concurrently processed chunk of the cached index	INDEX_DISTRIBUTION_CHUNK_SIZE	100
index.insertion-ttl	Amount of retries for the insertion of single index entry	INDEX_INSERTION_TTL	5
search.cache-size	Size of cache for search results	SEARCH_CACHE_SIZE	5
peers.uri	Entrypoint to the P2P network	-	http://localhost:8090/
peers.log-body	Enables client logging of response bodies	PEERS_LOG_BODY	true
peers.max-wait-queue-limit	Max wait queue limit for requests to peers	PEERS_MAX_WAIT_QUEUE_LIMIT	1024

Index Storage Policies

The application can be started with three different storage policies. They work as follows:

local
- Indexing and retrieval is both done locally in memory on the host machine.
distributed
- Indexing and retrieval is handled by calling the P2P network directly.
- ⚠️ Using this option the index request blocks until all entries are distributed to the P2P network. This can possibly take a long while. Therefore, an appropriate timeout should be set on the calling machine.
lazy-distributed
- Indexing is done by first storing the index in a local cache and then distributing it on a background thread.
- The interval for scanning the cached indexed can be configured with the option index.distribution-interval

Run tests

$ sbt test

Retrieve Search Results

📁 To the API-Doc's

Simple Queries

Only searches with at least one term in the query fields are valid. All the other fields are optional.

To limit the maximum number of results search.max_results can be set to any positive integer.

The simplest possible search request has following form:

{
  "search": {
    "query": {
      "terms": "your query here..."
    }
  }
}

In the above example all the terms specified in terms will be combined with AND.

To combine the terms with different boolean operators the search you can extend the search with arbitrary additional terms.

{
  "search": {
    "query": {
      "terms": "find this ...",
      "additions": [
        {
          "connector": "or",
          "terms": "... or that ..."
        },
        {
          "connector": "and_not",
          "terms": "... but not that"
        }
      ]
    }
  }
}

Evaluation order of boolean operators

First all terms fields are evaluated with AND in isolation.
The results will then be combined by the specified connector and evaluated in the following order:
- AND_NOT
- AND
- OR

Filtered Queries

Up until now the queries can be filtered by speaker or affiliation.

If several filters with same criteria are specified they're combined by OR, while the entire set resulting from all filters of same type will by combined by AND with the actually specified query results.

{
  "search": {
    "query": {
      "terms": "some search"
    },
    "filter": [
      {
        "criteria": "affiliation",
        "value": "SPD"
      },
      {
        "criteria": "affiliation",
        "value": "Die Linke"
      },
      {
        "criteria": "speaker",
        "value": "Peter Lustig"
      }
    ]
  }
}

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
docs		docs
project		project
src		src
.gitignore		.gitignore
.openapi-generator-ignore		.openapi-generator-ignore
.scalafmt.conf		.scalafmt.conf
Dockerfile		Dockerfile
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bundestag Speech Search

Get started

Configure the App

Configuration properties

Index Storage Policies

Run tests

Retrieve Search Results

Simple Queries

Evaluation order of boolean operators

Filtered Queries

About

Releases

Packages

Contributors 3

Languages

htw-projekt-p2p-volltextsuche/fulltext-search

Folders and files

Latest commit

History

Repository files navigation

Bundestag Speech Search

Get started

Configure the App

Configuration properties

Index Storage Policies

Run tests

Retrieve Search Results

Simple Queries

Evaluation order of boolean operators

Filtered Queries

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages