Skip to content

Function-calling API for LLM from multiple providers

License

Notifications You must be signed in to change notification settings

pikocloud/pikobrain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PikoBrain

PikoBrain is function-calling API for LLM from multiple providers.

The key project features:

  • allows you to define model configuration
  • provides universal API regardless of LLM
  • provides actual function calling (currently OpenAPI)
  • (optionally) supports different models for Vision and text
  • Basic UI

It allows set functions (RAG) without vendor lock-in.

The project LICENSED under MPL-2.0 Exhibit A which promotes collaboration (requires sharing changes) but does not restrict for commercial or any other usage.

Roadmap

Providers

State

Integration

  • Webhooks
  • NATS Notifications

Functions

  • OpenAPI (including automatic reload)
  • Internal functions (threads)
  • Scripting functions

Libraries

  • Python
  • Golang
  • Typescript

Installation

  • Source (requires go 1.22.5+) go run github.com/pikocloud/pikobrain@latest <args>
  • Binary in releases
  • Docker ghcr.io/pikocloud/pikobrain

Usage

Binary

pikobrain --config examples/brain.yaml --tools examples/tools.yaml

Docker

docker run --rm -v $(pwd):/data -v $(pwd)/examples:/config:ro -p 8080:8080 ghcr.io/pikocloud/pikobrain
  • Define model and tools like in examples/
  • Run service
  • Call service

Basic UI

http://127.0.0.1:8080

Screenshot from 2024-08-10 20-13-32

Note

UI designed primarily for admin tasks. For user-friendly chat experience use something like LibreChat

Request

POST http://127.0.0.1:8080

Input can be:

  • multipart/form-data payload (preferred), where:
    • each part can be text/plain (default if not set), application/x-www-form-urlencoded, application/json, image/png, image/jpeg, image/webp, image/gif
    • may contain header X-User in each part which maps to user field in providers
    • may contain header X-Role where values could be user (default) or assistant
    • multipart name doesn't matter
  • application/x-www-form-urlencoded; content will be decoded
  • text/plain, application/json
  • image/png, image/jpeg, image/webp, image/gif
  • without content type, then payload should be valid UTF-8 string and will be used as single payload

Request may contain query parameter user which maps to user field and/or query role (user or assistant)

Multipart payload allows caller provide full history context messages. For multipart, header X-User and X-Role may override query parameters.

Output is the response from LLM.

[!INFO]
User field is not used for inference. Only for audit.

Threads

In addition to normal usage, it's possible to use stateful chat context within "thread".

For every request historical questions will be fetched (up to depth).

Add and run

POST http://127.0.0.1:8080/<thread name>

Content can be empty (just run)

Just add

PUT http://127.0.0.1:8080/<thread name>

Clients

Python with aiohttp
import asyncio
import io
from dataclasses import dataclass
from datetime import timedelta
from typing import Literal, Iterable

import aiohttp


@dataclass(frozen=True, slots=True)
class Message:
    content: str | bytes | io.BytesIO
    mime: str | None = None
    role: Literal['assistant', "user"] | None = None
    user: str | None = None


@dataclass(frozen=True, slots=True)
class Response:
    content: bytes
    mime: str
    duration: timedelta
    input_messages: int
    input_tokens: int
    output_tokens: int
    total_tokens: int


async def request(url: str, messages: Iterable[Message]) -> Response:
    with aiohttp.MultipartWriter('form-data') as mpwriter:
        for message in messages:
            headers = {}
            if message.mime:
                headers[aiohttp.hdrs.CONTENT_TYPE] = message.mime
            if message.role:
                headers['X-Role'] = message.role
            if message.user:
                headers['X-User'] = message.user

            mpwriter.append(message.content, headers)

        async with aiohttp.ClientSession() as session, session.post(url, data=mpwriter) as res:
            assert res.ok, await res.text()
            return Response(
                content=await res.read(),
                mime=res.headers.get(aiohttp.hdrs.CONTENT_TYPE),
                duration=timedelta(seconds=float(res.headers.get('X-Run-Duration'))),
                input_messages=int(res.headers.get('X-Run-Context')),
                input_tokens=int(res.headers.get('X-Run-Input-Tokens')),
                output_tokens=int(res.headers.get('X-Run-Output-Tokens')),
                total_tokens=int(res.headers.get('X-Run-Total-Tokens')),
            )


async def example():
    res = await request('http://127.0.0.1:8080', messages=[
        Message("My name is RedDec. You name is Bot."),
        Message("What is your and my name?"),
    ])
    print(res)

cURL

Simple

curl --data 'Why sky is blue?' http://127.0.0.1:8080

Text multipart

curl -F '_=my name is RedDec' -F '_=What is my name?' -v http://127.0.0.1:8080

Image and text

curl -F '_=@eifeltower.jpeg' -F '_=Describe the picture' -v http://127.0.0.1:8080

CLI

Application Options:
      --timeout=                  LLM timeout (default: 30s) [$TIMEOUT]
      --refresh=                  Refresh interval for tools (default: 30s) [$REFRESH]
      --config=                   Config file (default: brain.yaml) [$CONFIG]
      --tools=                    Tool file [$TOOLS]

Debug:
      --debug.enable              Enable debug mode [$DEBUG_ENABLE]

Database configuration:
      --db.url=                   Database URL (default: sqlite://data.sqlite?cache=shared&_fk=1&_pragma=foreign_keys(1)) [$DB_URL]
      --db.max-conn=              Maximum number of opened connections to database (default: 10) [$DB_MAX_CONN]
      --db.idle-conn=             Maximum number of idle connections to database (default: 1) [$DB_IDLE_CONN]
      --db.idle-timeout=          Maximum amount of time a connection may be idle (default: 0) [$DB_IDLE_TIMEOUT]
      --db.conn-life-time=        Maximum amount of time a connection may be reused (default: 0) [$DB_CONN_LIFE_TIME]

HTTP server configuration:
      --http.bind=                Bind address (default: :8080) [$HTTP_BIND]
      --http.tls                  Enable TLS [$HTTP_TLS]
      --http.ca=                  Path to CA files. Optional unless IGNORE_SYSTEM_CA set (default: ca.pem) [$HTTP_CA]
      --http.cert=                Server certificate (default: cert.pem) [$HTTP_CERT]
      --http.key=                 Server private key (default: key.pem) [$HTTP_KEY]
      --http.mutual               Enable mutual TLS [$HTTP_MUTUAL]
      --http.ignore-system-ca     Do not load system-wide CA [$HTTP_IGNORE_SYSTEM_CA]
      --http.read-header-timeout= How long to read header from the request (default: 3s) [$HTTP_READ_HEADER_TIMEOUT]
      --http.graceful=            Graceful shutdown timeout (default: 5s) [$HTTP_GRACEFUL]
      --http.timeout=             Any request timeout (default: 30s) [$HTTP_TIMEOUT]
      --http.max-body-size=       Maximum payload size in bytes (default: 1048576) [$HTTP_MAX_BODY_SIZE]

Providers

OpenAI

First-class support, everything works just fine.

Google

Good support. Known limitations:

  • date-time not supported in tools
  • empty object (aka any JSON) is not supported
  • for complex schemas, gemini-1.5-flash may hallucinate and call with incorrect arguments. Use gemini-1.5-pro

Ollama

Requires Ollama 0.3.3+

Recommended model: llava for vision and mistral:instruct for general messages (including function calling).

model: 'mistral:instruct'
vision:
  model: 'llava'

Tip

Check https://ollama.com/library for models with 'tools' and 'vision' features. The bigger model then generally better. For non-vision models, instruct kind usually better.

AWS Bedrock

Warning

Due to multiple limitations, only Claude 3+ models are working properly. Recommended multi-modal model for AWS Bedrock is Anthropic Claude-3-5.

Initial support.

  • Some models may not support system prompt.
  • Some models may not support tools.
  • Authorization is ignored (use AWS environment variables)
  • forceJSON is not supported (workaround: use tools)

Required minimal set of environment variables

AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=

Please refer to AWS Environment variable cheatsheet for configuration.

Based on function calling feature the recommended models are:

  • Anthropic Claude 3 models
  • Mistral AI Mistral Large and Mistral Small
  • Cohere Command R and Command R+

See list of compatibilities