PikoBrain

PikoBrain is function-calling API for LLM from multiple providers.

The key project features:

allows you to define model configuration
provides universal API regardless of LLM
provides actual function calling (currently OpenAPI)
(optionally) supports different models for Vision and text
Basic UI

It allows set functions (RAG) without vendor lock-in.

The project LICENSED under MPL-2.0 Exhibit A which promotes collaboration (requires sharing changes) but does not restrict for commercial or any other usage.

Roadmap

Providers

OpenAI
AWS Bedrock
Ollama
Google

State

Threads

Integration

Webhooks
NATS Notifications

Functions

OpenAPI (including automatic reload)
Internal functions (threads)
Scripting functions

Libraries

Python
Golang
Typescript

Installation

Source (requires go 1.22.5+) go run github.com/pikocloud/pikobrain@latest <args>
Binary in releases
Docker ghcr.io/pikocloud/pikobrain

Usage

Binary

pikobrain --config examples/brain.yaml --tools examples/tools.yaml

Docker

docker run --rm -v $(pwd):/data -v $(pwd)/examples:/config:ro -p 8080:8080 ghcr.io/pikocloud/pikobrain

Define model and tools like in examples/
Run service
Call service

Basic UI

http://127.0.0.1:8080

Note

UI designed primarily for admin tasks. For user-friendly chat experience use something like LibreChat

Request

POST http://127.0.0.1:8080

Input can be:

multipart/form-data payload (preferred), where:
- each part can be text/plain (default if not set), application/x-www-form-urlencoded, application/json, image/png, image/jpeg, image/webp, image/gif
- may contain header X-User in each part which maps to user field in providers
- may contain header X-Role where values could be user (default) or assistant
- multipart name doesn't matter
application/x-www-form-urlencoded; content will be decoded
text/plain, application/json
image/png, image/jpeg, image/webp, image/gif
without content type, then payload should be valid UTF-8 string and will be used as single payload

Request may contain query parameter user which maps to user field and/or query role (user or assistant)

Multipart payload allows caller provide full history context messages. For multipart, header X-User and X-Role may override query parameters.

Output is the response from LLM.

[!INFO]
User field is not used for inference. Only for audit.

Threads

In addition to normal usage, it's possible to use stateful chat context within "thread".

For every request historical questions will be fetched (up to depth).

Add and run

POST http://127.0.0.1:8080/<thread name>

Content can be empty (just run)

Just add

PUT http://127.0.0.1:8080/<thread name>

Clients

Python with aiohttp

import asyncio
import io
from dataclasses import dataclass
from datetime import timedelta
from typing import Literal, Iterable

import aiohttp


@dataclass(frozen=True, slots=True)
class Message:
    content: str | bytes | io.BytesIO
    mime: str | None = None
    role: Literal['assistant', "user"] | None = None
    user: str | None = None


@dataclass(frozen=True, slots=True)
class Response:
    content: bytes
    mime: str
    duration: timedelta
    input_messages: int
    input_tokens: int
    output_tokens: int
    total_tokens: int


async def request(url: str, messages: Iterable[Message]) -> Response:
    with aiohttp.MultipartWriter('form-data') as mpwriter:
        for message in messages:
            headers = {}
            if message.mime:
                headers[aiohttp.hdrs.CONTENT_TYPE] = message.mime
            if message.role:
                headers['X-Role'] = message.role
            if message.user:
                headers['X-User'] = message.user

            mpwriter.append(message.content, headers)

        async with aiohttp.ClientSession() as session, session.post(url, data=mpwriter) as res:
            assert res.ok, await res.text()
            return Response(
                content=await res.read(),
                mime=res.headers.get(aiohttp.hdrs.CONTENT_TYPE),
                duration=timedelta(seconds=float(res.headers.get('X-Run-Duration'))),
                input_messages=int(res.headers.get('X-Run-Context')),
                input_tokens=int(res.headers.get('X-Run-Input-Tokens')),
                output_tokens=int(res.headers.get('X-Run-Output-Tokens')),
                total_tokens=int(res.headers.get('X-Run-Total-Tokens')),
            )


async def example():
    res = await request('http://127.0.0.1:8080', messages=[
        Message("My name is RedDec. You name is Bot."),
        Message("What is your and my name?"),
    ])
    print(res)

cURL

Simple

curl --data 'Why sky is blue?' http://127.0.0.1:8080

Text multipart

curl -F '_=my name is RedDec' -F '_=What is my name?' -v http://127.0.0.1:8080

Image and text

curl -F '_=@eifeltower.jpeg' -F '_=Describe the picture' -v http://127.0.0.1:8080

CLI

Application Options:
      --timeout=                  LLM timeout (default: 30s) [$TIMEOUT]
      --refresh=                  Refresh interval for tools (default: 30s) [$REFRESH]
      --config=                   Config file (default: brain.yaml) [$CONFIG]
      --tools=                    Tool file [$TOOLS]

Debug:
      --debug.enable              Enable debug mode [$DEBUG_ENABLE]

Database configuration:
      --db.url=                   Database URL (default: sqlite://data.sqlite?cache=shared&_fk=1&_pragma=foreign_keys(1)) [$DB_URL]
      --db.max-conn=              Maximum number of opened connections to database (default: 10) [$DB_MAX_CONN]
      --db.idle-conn=             Maximum number of idle connections to database (default: 1) [$DB_IDLE_CONN]
      --db.idle-timeout=          Maximum amount of time a connection may be idle (default: 0) [$DB_IDLE_TIMEOUT]
      --db.conn-life-time=        Maximum amount of time a connection may be reused (default: 0) [$DB_CONN_LIFE_TIME]

HTTP server configuration:
      --http.bind=                Bind address (default: :8080) [$HTTP_BIND]
      --http.tls                  Enable TLS [$HTTP_TLS]
      --http.ca=                  Path to CA files. Optional unless IGNORE_SYSTEM_CA set (default: ca.pem) [$HTTP_CA]
      --http.cert=                Server certificate (default: cert.pem) [$HTTP_CERT]
      --http.key=                 Server private key (default: key.pem) [$HTTP_KEY]
      --http.mutual               Enable mutual TLS [$HTTP_MUTUAL]
      --http.ignore-system-ca     Do not load system-wide CA [$HTTP_IGNORE_SYSTEM_CA]
      --http.read-header-timeout= How long to read header from the request (default: 3s) [$HTTP_READ_HEADER_TIMEOUT]
      --http.graceful=            Graceful shutdown timeout (default: 5s) [$HTTP_GRACEFUL]
      --http.timeout=             Any request timeout (default: 30s) [$HTTP_TIMEOUT]
      --http.max-body-size=       Maximum payload size in bytes (default: 1048576) [$HTTP_MAX_BODY_SIZE]

Providers

OpenAI

First-class support, everything works just fine.

Google

Good support. Known limitations:

date-time not supported in tools
empty object (aka any JSON) is not supported
for complex schemas, gemini-1.5-flash may hallucinate and call with incorrect arguments. Use gemini-1.5-pro

Ollama

Requires Ollama 0.3.3+

Recommended model: llava for vision and mistral:instruct for general messages (including function calling).

model: 'mistral:instruct'
vision:
  model: 'llava'

Tip

Check https://ollama.com/library for models with 'tools' and 'vision' features. The bigger model then generally better. For non-vision models, instruct kind usually better.

AWS Bedrock

Warning

Due to multiple limitations, only Claude 3+ models are working properly. Recommended multi-modal model for AWS Bedrock is Anthropic Claude-3-5.

Initial support.

Some models may not support system prompt.
Some models may not support tools.
Authorization is ignored (use AWS environment variables)
forceJSON is not supported (workaround: use tools)

Required minimal set of environment variables

AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=

Please refer to AWS Environment variable cheatsheet for configuration.

Based on function calling feature the recommended models are:

Anthropic Claude 3 models
Mistral AI Mistral Large and Mistral Small
Cohere Command R and Command R+

See list of compatibilities

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
examples		examples
internal		internal
testutils		testutils
.envrc		.envrc
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PikoBrain

Roadmap

Installation

Usage

Threads

Clients

cURL

CLI

Providers

OpenAI

Google

Ollama

AWS Bedrock

About

Releases 6

Packages

Languages

License

pikocloud/pikobrain

Folders and files

Latest commit

History

Repository files navigation

PikoBrain

Roadmap

Installation

Usage

Threads

Clients

cURL

CLI

Providers

OpenAI

Google

Ollama

AWS Bedrock

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

Packages