Skip to content

Local API Server

cosmic-snow edited this page Aug 6, 2024 · 2 revisions

The GPT4All Chat Desktop Application comes with a built-in server mode allowing you to programmatically interact with any supported local LLM through a familiar HTTP API. Namely, the server implements a subset of the OpenAI API specification.

Note

The server exposes an API, not a Web GUI client. However, existing web clients which support this API should be able to use it.

Activating the Server

You can enable the webserver via GPT4All Chat > Settings > Application > Enable Local Server. By default, it listens on port 4891 (the reverse of 1984).

The documentation has short descriptions of the settings.

Connecting to the Server

The quickest way to ensure connections are allowed is to open the path /v1/models in your browser, as it is a GET endpoint. Try the first endpoint link below. If this doesn't work, you probably have to adjust your firewall settings.

Important

  • It can only be accessed through the http protocol, never https.
  • The server only listens on the local machine, that is localhost or 127.0.0.1.
  • It does not listen on the IPv6 localhost address ::1.

Endpoints

Four endpoints are currently implemented:

Method Path Example Link to Full URL (Default Port)
GET /v1/models http://localhost:4891/v1/models
GET /v1/models/<name> http://localhost:4891/v1/models/Phi-3%20Mini%20Instruct
POST /v1/completions http://localhost:4891/v1/completions
POST /v1/chat/completions http://localhost:4891/v1/chat/completions

In the GPT4All Chat Application

The local API server has its own chat window. If you already have a number of saved conversations it may not be obvious, but it's the last entry in the chat history sidebar. Therefore you need to scroll all the way to the bottom.

It has a different background color so you know you're picking the right one.

Enabling LocalDocs

You can activate LocalDocs from within the GUI. Follow these steps:

  1. Open the Chats view and open both sidebars.
  2. Scroll down to the bottom in the left sidebar (chat history); the last entry will be for the server itself. Activate that chat.
  3. Activate one or more LocalDocs collections in the right sidebar.
  4. Use a client and make server requests now.

Note

It is not possible to activate LocalDocs or select a collection from a client through the API.

Examples

Using the OpenAI Python Client Library

For this example, use an old-style library, preferably in a virtual environment. That is, create a virtual environment and install it with pip install "openai ~= 0.28".

import openai

#openai.api_base = "https://api.openai.com/v1"
openai.api_base = "http://localhost:4891/v1"

openai.api_key = "not needed for a local LLM"

# Set up the prompt and other parameters for the API request
prompt = "Who is Michael Jordan?"

#model = "gpt-3.5-turbo"
model = "Phi-3 Mini Instruct"

# Make the API request
response = openai.Completion.create(
    model=model,
    prompt=prompt,
    max_tokens=50,
    temperature=0.28,
    top_p=0.95,
    n=1,
    echo=True,
    stream=False
)

# Print the generated completion
print(response)

Example cURL

cURL is a command-line tool for talking to servers through various protocols.

curl -X POST http://localhost:4891/v1/chat/completions -d '{
  "model": "Phi-3 Mini Instruct",
  "messages": [{"role":"user","content":"hi, who are you?"}],
  "max_tokens": 2048,
  "temperature": 0.7
}'

Example PowerShell

On Windows, PowerShell is nowadays the preferred CLI for scripting. It provides its own function to talk to an HTTP server.

Invoke-WebRequest -URI http://localhost:4891/v1/chat/completions -Method POST -ContentType application/json -Body '{
  "model": "Phi-3 Mini Instruct",
  "messages": [{"role":"user","content":"hi, who are you?"}],
  "max_tokens": 2048,
  "temperature": 0.7
}'

See Also