Skip to content

Commit

Permalink
add OpenSearchExt (#11)
Browse files Browse the repository at this point in the history
* add OpenSearchExt

* Update docker-compose.yml

* Update OpenSearchAuth.jl

* Update chunks_index_schema.json

* Update OpenSearchExt.jl

* doc: added a page about OpenSearch

* README: fixed a dev documentation label

---------

Co-authored-by: Egor Shmorgun <egor.shmorgun@opensesame.com>
  • Loading branch information
os-rss and os-esh authored Jul 21, 2023
1 parent dff7833 commit d7c6947
Show file tree
Hide file tree
Showing 13 changed files with 516 additions and 3 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Manifest.toml
/docs/build/
/docs/Manifest.toml
debug_out

envs*

# VS Code
.vscode/*
Expand Down
1 change: 1 addition & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ authors = ["Community contributors"]
version = "0.1.0"

[deps]
AWS = "fbe9abb3-538b-5e4e-ba9e-bc94f4f92ebc"
BytePairEncoding = "a4280ba5-8788-555a-8ca8-4a8c3d966a71"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
DebugDataWriter = "810e33c6-efd6-4462-86b1-f71ae88af720"
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# GptSearchPlugin

[![](https://img.shields.io/badge/docs-stable-blue.svg)](https://opensesame.github.io/GptSearchPlugin)
<!-- [![](https://img.shields.io/badge/docs-stable-blue.svg)](https://opensesame.github.io/GptSearchPlugin) -->
[![](https://img.shields.io/badge/docs-dev-blue.svg)](https://opensesame.github.io/GptSearchPlugin/dev)

Simple and pure Julia-based implementation of a GPT retrieval plugin logic

Expand Down
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ makedocs(
"Vector storage providers" => [
"Getting Started" => "Providers/index.md",
"Elasticsearch" => "Providers/Elasticsearch/index.md",
"OpenSearch" => "Providers/Opensearch/index.md",
],
"Internals" => "Internals/index.md"
]
Expand Down
7 changes: 6 additions & 1 deletion docs/src/Providers/Elasticsearch/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,9 @@ Use our [compose-script](docker-compose.yml) and run the following command in a

docker-compose up

If you want to configure a cluster [see full instructions](https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docker.html#docker)
If you want to configure a cluster [see full instructions](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker)

To enable ElasticSearch use:
```bash
export DATASTORE=ELASTICSEARCH
```
12 changes: 12 additions & 0 deletions docs/src/Providers/Opensearch/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
version: '3'
services:
opensearch-node: # This is also the hostname of the container within the Docker network (i.e. https://opensearch-node1/)
image: opensearchproject/opensearch:latest # Specifying the latest available image - modify if you want a specific version
container_name: opensearch-node
environment:
- discovery.type=single-node
- plugins.security.disabled=true
ports:
- 9200:9200 # REST API
- 9600:9600 # Performance Analyzer

14 changes: 14 additions & 0 deletions docs/src/Providers/Opensearch/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# OpenSearch as a vector storage

OpenSearch is a search application licensed under Apache 2.0 and available as a cloud solution on AWS. OpenSearch can be installed in [a number of ways](https://opensearch.org/downloads.html). But the easiest way to have it locally without authentication is to use Docker.

Use our [compose-script](docker-compose.yml) and run the following command in a terminal:

docker-compose up

If you want to configure a cluster [see full instructions](https://opensearch.org/docs/latest/install-and-configure/index/)

To enable OpenSearch use:
```bash
export DATASTORE=OPENSEARCH
```
8 changes: 8 additions & 0 deletions docs/src/Providers/index.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,10 @@
# Vector storage service providers

The plugin uses a vector storage to store the calculated embedding vectors and doing approximate search.

The plugin can use any storage if it provides it:
- Store vectors up to 2k in size (for OpenAI embeddings);
- Add and remove records by a string ID with a vector and metadata;
- Do an approximate vector search with a distance score.

See the current implementations of the storage interfaces if you want to create a new one.
47 changes: 47 additions & 0 deletions ext/OpenSearchExt/OpenSearchAuth.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
module OpenSearchAuth

using HTTP
using AWS
using Dates
using TimeZones

const AWS_REQUEST_HEADERS = ["Host", "Content-Type", "User-Agent"]
const AWS_AUTH_HEADERS = ["Authorization", "Content-MD5", "x-amz-date", "x-amz-security-token"]

const ES_SERVICE_NAME = "es"
# TODO: add a mechanism for configuring a region of deployment for the instance
const ES_REGION = "us-west-1"

const ACCEPT_HEADER_KEY = "Accept"

function auth_layer(handler)
return function (req; auth_params::AWSCredentials, kw...)
config = AWSConfig(;creds=auth_params, region=ES_REGION)

headers = Dict{String,String}(req.headers)
# With accept header AWS return error about mismatch signature
accept_header = pop!(headers, ACCEPT_HEADER_KEY, nothing)

aws_request = AWS.Request(
content=req.body.s,
url=string(req.url),
headers=headers,
api_version="none",
request_method=req.method,
service=ES_SERVICE_NAME
)
AWS.sign_aws4!(config, aws_request, now(UTC))

if !isnothing(accept_header)
aws_request.headers[ACCEPT_HEADER_KEY] = accept_header
end

req.headers = collect(aws_request.headers)

return handler(req; kw...)
end
end

HTTP.@client [auth_layer]

end
Loading

0 comments on commit d7c6947

Please sign in to comment.