Easy-to-use LLM API from state-of-the-art providers and comparison.
- Easy-to-use: A simple and easy-to-use API for state-of-the-art language models from different providers but using in a same way.
- Comparison: Compare the cost and performance of different providers and models. Let you choose the best provider and model for your use case.
- Log: Log the response and cost of the request in a log file.
- Providers: Support for all of providers both open-source and closed-source.
- Result: See the actual time taken by the request, especially when you dont't trust the benchmark.
pip3 install api4all
- Unix / macOS
python3 -m venv venv
source venv/bin/activate
- Windows
python3 -m venv venv
.\venv\Scripts\activate
TOGETHER_API_KEY=xxx
OPENAI_API_KEY=xxx
MISTRAL_API_KEY=xxx
ANTHROPIC_API_KEY=xxx
or set the environment variable directly.
export TOGETHER_API_KEY=xxx
export OPENAI_API_KEY=xxx
from api4all import EngineFactory
messages = [
{"role": "system",
"content": "You are a helpful assistent for the my Calculus class."},
{"role": "user",
"content": "What is the current status of the economy?"}
]
engine = EngineFactory.create_engine(provider="together",
model="google/gemma-7b-it",
messages=messages,
temperature=0.9,
max_tokens=1028,
)
response = engine.generate_response()
print(response)
- There are some examples in the examples folder or to test the examples in Google Colab.
3. Check the log file for the response and the cost of the request.
Request ID - fa8cebd0-265a-44b2-95d7-6ff1588d2c87
create at: 2024-03-15 16:38:18,129
INFO - SUCCESS
Response:
I am not able to provide information about the current status of the economy, as I do not have access to real-time information. Therefore, I recommend checking a reliable source for the latest economic news and data.
Cost: $0.0000154 # Cost of this provider for this request
Provider: together # Provider used for this request
Execution-time: Execution time not provided by the provider
Actual-time: 0.9448428153991699 # Actual time taken by the request
Input-token: 33 # Number of tokens used for the input
Output-token: 44 # Number of tokens used for the output
Provider | Free Credit | Rate Limit | API Key name | Provider string name |
---|---|---|---|---|
Groq | Unlimited | 30 Requests / Minute | GROQ_API_KEY | "groq" |
Anyscale | $10 | 30 Requests / Second | ANYSCALE_API_KEY | "anyscale" |
Together AI | $25 | 1 Requests / Second | TOGETHER_API_KEY | "together" |
Replicate | Free to try | 50 Requests / Second | REPLICATE_API_KEY | "replicate" |
Fireworks | $1 | 600 Requests / Minute | FIREWORKS_API_KEY | "fireworks" |
Deepinfra | Free to try | 200 Concurrent request | DEEPINFRA_API_KEY | "deepinfra" |
Lepton | $10 | 10 Requests / Minute | LEPTON_API_KEY | "lepton" |
------ | ------ | ------ | ------ | ------ |
Google AI (Vertex AI) | Unlimited | 60 Requests / Minute | GOOGLE_API_KEY | "google" |
OpenAI | ✕ | 60 Requests / Minute | OPENAI_API_KEY | "openai" |
Mistral AI | Free to try | 5 Requests / Second | MISTRAL_API_KEY | "mistral" |
Anthropic | Free to try | 5 Requests / Minute | ANTHROPIC_API_KEY | "anthropic" |
- Free to try: Free to try, no credit card required but limited to a certain number of tokens.
- Rate limit is based on the free plan of the provider. The actual rate limit may be different based on the plan you choose.
-- | Mixtral-8x7b-Instruct-v0.1 | Gemma 7B it | Mistral-7B-Instruct-v0.1 | LLaMA2-70b | Mistral-7B-Instruct-v0.2 | CodeLlama-70b-Instruct | LLaMA3-8b-Instruct | LLaMA3-80b |
---|---|---|---|---|---|---|---|---|
API string name | "mistralai/Mixtral-8x7B-Instruct-v0.1" | "google/gemma-7b-it" | "mistralai/Mistral-7B-Instruct-v0.1" | "meta/Llama-2-70b-chat" | "mistralai/Mistral-7B-Instruct-v0.2" | "meta/CodeLlama-2-70b-intruct" | "meta/Llama-3-8b-Instruct" | "meta/Llama-3-80b" |
Context Length | 32,768 | 8.192 | 4,096 | 4,096 | 32,768 | 16,384 | 8,192 | 8,192 |
Developer | Mistral AI | Mistral AI | Meta | Mistral AI | Meta | Meta | Meta | |
Cost (Input - Output / MTokens) | ----- | ------ | ------ | ----- | ------ | ------ | ------ | ------ |
Groq | $0-$0 | $0-$0 | ✕ | $0-$0 | ✕ | ✕ | $0-$0 | $0-$0 |
Anyscale | $0.5-$0.5 | $0.15-$0.15 | $0.05-$0.25 | $1.0-$1.0 | ✕ | $1.0-$1.0 | $0.15-$0.15 | $1.0-$1.0 |
Together AI | $0.6-$0.6 | $0.2-$0.2 | $0.2-$0.2 | $0.9-$0.9 | $0.05-$0.25 | $0.9-$0.9 | $0.2-$0.2 | $0.9-$0.9 |
Replicate | $0.3-$1 | ✕ | $0.05-$0.25 | $0.65-$2.75 | $0.2-$0.2 | $0.65-$2.75 | $0.05-$0.25 | $0.65-$2.75 |
Fireworks | $0.5-$0.5 | ✕ | $0.2-$0.2 | $0.9-$0.9 | $0.2-$0.2 | $0.9-$0.9 | $0.2-$0.2 | $0.9-$0.9 |
Deepinfra | $0.27-$0.27 | $0.13-$0.13 | $0.13-$0.13 | $0.7-$0.9 | ✕ | $0.7-$0.9 | $0.08-$0.08 | $0.59-$0.79 |
Lepton | $0.5-$0.5 | ✕ | ✕ | $0.8-$0.8 | ✕ | ✕ | $0.07-$0.07 | $0.8-$0.8 |
Model | Input Pricing ($/1M Tokens) | Output Pricing ($/1M Tokens) | Context Length | API string name |
---|---|---|---|---|
Mistral-7B-Instruct-v0.1 | $0.25 | $0.25 | 8,192 | "mistral/open-mistral-7b" |
Mixtral-8x7b-Instruct-v0.1 | $0.7 | $0.7 | 8,192 | "mistral/open-mixtral-8x7b" |
Mixtral Small | $2 | $6 | ✕ | "mistral/mistral-small-latest" |
Mixtral Medium | $2.7 | $8.1 | ✕ | "mistral/mistral-medium-latest" |
Mixtral Large | $8 | $24 | ✕ | "mistral/mistral-large-latest" |
Model | Input Pricing ($/1M Tokens) | Output Pricing ($/1M Tokens) | Context Length | API string name |
---|---|---|---|---|
GPT-3.5-0125 | $0.5 | $1.5 | 16,385 | "openai/gpt-3.5-turbo-0125" |
GPT-3.5 | $0.5 | $1.5 | 16,385 | "openai/gpt-3.5-turbo" |
GPT-4 | $30 | $60 | 8,192 | "openai/gpt-4" |
GPT-4 | $60 | $120 | 32,768 | "openai/gpt-4-32k" |
Model | Input Pricing ($/1M Tokens) | Output Pricing ($/1M Tokens) | Context Length | API string name |
---|---|---|---|---|
Claude 3 Opus | $15 | $75 | 200,000 | "anthropic/claude-3-opus" |
Claude 3 Sonnet | $3 | $15 | 200,000 | "anthropic/claude-3-sonnet" |
Claude 3 Haiku | $0.25 | $1.25 | 200,000 | "anthropic/claude-3-haiku" |
Claude 2.1 | $8 | $24 | 200,000 | "anthropic/claude-2.1" |
Claude 2.0 | $8 | $24 | 100,000 | "anthropic/claude-2.0" |
Claude 2.0 | $0.8 | $2.4 | 100,000 | "anthropic/claude-instant-1.2" |
Model | Input Pricing ($/1M Tokens) | Output Pricing ($/1M Tokens) | Context Length | API string name |
---|---|---|---|---|
Google Gemini 1.0 Pro | $0 | $0 | 32,768 | "google/gemini-1.0-pro" |
Welcome to contribute to the project. If you see any updated pricing, new models, new providers, or any other changes, feel free to open an issue or a pull request.
ValueError: The `response.text` quick accessor only works when the response contains a valid `Part`, but none was returned. Check the `candidate.safety_ratings` to see if the response was blocked.
Solution: The output is larger than your maximum tokens. Increase the max_tokens
.