Realtime analytics server for AI DIAL. The service consumes the logs stream from AI DIAL Core, analyzes the conversation and writes the analytics to the InfluxDB.
Refer to Documentation to learn how to configure AI DAL Core and other necessary components.
Check the AI DIAL Core documentation to configure the way to send the logs to the instance of the realtime analytics server.
The realtime analytics server analyzes the logs stream in the realtime and writes the metric analytics
to the InfluxDB with the following data:
Tag | Description |
---|---|
model | The model name for the completion request. |
deployment | The deployment name of the model or application for the completion request. |
project_id | The project ID for the completion request |
language | The language of the conversation detected by the messages content. |
topic | The topic of the conversation detected by the messages content. |
title | The title of the person making the request. |
response_id | Unique ID of this response. |
Field | Description |
---|---|
user_hash | The unique hash for the user. |
price | The calculated price of the request. |
number_request_messages | The total number of messages in history for this request. |
chat_id | The unique ID of this convestation. |
prompt_tokens | The number of tokens in the prompt including conversation history and the current message |
completion_tokens | The number of completion tokens generated for this request |
Copy .env.example
to .env
and customize it for your environment.
You need to specify the connection options to the InfluxDB instance using the environment variables:
Variable | Description |
---|---|
INFLUX_URL | Url to the InfluxDB to write the analytics data |
INFLUX_ORG | Name of the InfluxDB organization to write the analytics data |
INFLUX_BUCKET | Name of the bucket to write the analytics data |
INFLUX_API_TOKEN | InfluxDB API Token |
You can follow the InfluxDB documentation to setup InfluxDB locally and acquire the required configuration parameters.
Also, following environment valuables can be used to configure the service behavior:
Variable | Default | Description |
---|---|---|
MODEL_RATES | {} | Specifies per-token price rates for models in JSON format |
TOPIC_MODEL | ./topic_model | Specifies the name or path for the topic model. If the model is specified by name, it will be downloaded from, the Huggingface. |
TOPIC_EMBEDDINGS_MODEL | None | Specifies the name or path for the embeddings model used with the topic model. If the model is specified by name, it will be downloaded from, the Huggingface. If None, the name will be used from the topic model config. |
Example of the MODEL_RATES configuration:
{
"gpt-4": {
"unit":"token",
"prompt_price":"0.00003",
"completion_price":"0.00006"
},
"gpt-35-turbo": {
"unit":"token",
"prompt_price":"0.0000015",
"completion_price":"0.000002"
},
"gpt-4-32k": {
"unit":"token",
"prompt_price":"0.00006",
"completion_price":"0.00012"
},
"text-embedding-ada-002": {
"unit":"token",
"prompt_price":"0.0000001"
},
"chat-bison@001": {
"unit":"char_without_whitespace",
"prompt_price":"0.0000005",
"completion_price":"0.0000005"
}
}
This project uses Python>=3.11 and Poetry>=1.6.1 as a dependency manager. Check out Poetry's documentation on how to install it on your system before proceeding.
To install requirements:
poetry install
This will install all requirements for running the package, linting, formatting and tests.
To build the wheel packages run:
make build
To run the development server locally run:
make serve
The server will be running as http://localhost:5001
To build the docker image run:
make docker_build
To run the server locally from the docker image run:
make docker_serve
The server will be running as http://localhost:5001
Run the linting before committing:
make lint
To auto-fix formatting issues run:
make format
Run unit tests locally:
make test
To remove the virtual environment and build artifacts:
make clean