Skip to content
This repository has been archived by the owner on Nov 13, 2024. It is now read-only.

Dockerfile POC #55

Closed
wants to merge 1 commit into from
Closed

Dockerfile POC #55

wants to merge 1 commit into from

Conversation

jhamon
Copy link
Collaborator

@jhamon jhamon commented Oct 2, 2023

I was playing around with docker this weekend just to refresh my memory on some stuff I used to know and realized it was fairly simple to build a container that runs the resin server. Up to you guys if you want to merge this or not, but I thought it might be educational for anybody who hasn't worked much with docker before to see some of the commands in action.

Problem

We need to build a docker image of this project so that people can more easily run and deploy it.

Solution

Implement a basic Dockerfile. Build the image and run a container locally to verify it works.

How to use it

# Build the image from the Dockerfile in the . directory
$ docker build -t resin-proxy .

# Run a container using the resin-proxy image
#    -it option tells it to print output interactively so we can see what's happening inside the container
#    --rm option tells docker to delete the container after it is stopped
#    --env-file option tells it where to find environment variables (e.g. OPENAI_API_KEY, etc)
#    -p option maps container ports to host ports.
$ docker run -it --rm --env-file ./.env -p 8000:8000 resin-proxy
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
WARNING:root:Failed to import splade encoder. If you want to use splade, install the splade extra dependencies by running: `pip install pinecone-text[splade]`
Starting Resin service on 0.0.0.0:8000
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
# Now, in another terminal window, try hitting the resin endpoint.
$ curl -X POST http://localhost:8000/context/chat/completions -H 'Content-Type: application/json' -d '{"stream": false, "messages": [{ "role": "user", "content": "What do you know about Pinecone?" }]}' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1431  100  1333  100    98    252     18  0:00:05  0:00:05 --:--:--   328
{
  "id": "544076fa-83a2-4b18-bbc1-c9edca2d82f6",
  "object": "chat.completion",
  "created": 1696216058,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Pinecone is a vector database designed for storing and querying high-dimensional vectors. It provides fast and efficient semantic search over these vector embeddings. By integrating OpenAI's LLMs (large language models) with Pinecone, it combines deep learning capabilities for embedding generation with efficient vector storage and retrieval. Pinecone surpasses traditional keyword-based search by offering contextually-aware and precise results. It is ideal for various use cases such as semantic text search, question-answering, visual search, recommendation systems, and more. Pinecone can handle very large scales of hundreds of millions and even billions of vector embeddings. It is a fully managed, cloud-native vector database with a simple API and no infrastructure hassles. It offers benefits like ultra-low query latency, live index updates, and the ability to combine vector search with metadata filtering or keyword search for more relevant results. Pinecone is SOC 2 Type II compliant and GDPR-ready. Source: Pinecone documentation"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 2393,
    "completion_tokens": 192,
    "total_tokens": 2585
  }
}

@jhamon jhamon marked this pull request as ready for review October 2, 2023 03:22
@jhamon
Copy link
Collaborator Author

jhamon commented Oct 2, 2023

This POC actually shows a security problem, IMO. Callers to the proxied openai endpoint should still have to pass a -H "Authorization: Bearer $OPENAI_API_KEY" request header, otherwise you're just opening up a free-for-all openai endpoint.

Copy link
Contributor

@miararoy miararoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhamon Thanks for this 🙏

@miararoy
Copy link
Contributor

miararoy commented Oct 2, 2023

@jhamon
Re: security.

this is a design choice (namely to not auth on the server), the reason is that this project authentication is on the user, but in case someone is running this in a public server, this will allow anyone to use the LLM underneath, and not a bug. for exmaple, any real deployment will probably use resin as internal API and will not expose it directly, and will use cloud level security, (IP whitelists and instance profiles)

@igiloh-pinecone
Copy link
Contributor

@jhamon thank you very much for your contribution!!! This is definitely a step in the direction we're planning to take.

We actually have a very similar (but slightly more elaborate) Dockerfile template that we're currently using for Pinecone's support bot. We are planning to port it to this project as well.
I'm closing this PR for now, but we'll definitely come back to this feature very shortly.

@igiloh-pinecone igiloh-pinecone deleted the jhamon/docker-poc branch November 6, 2023 16:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants