diff --git a/README.md b/README.md index 49eedeed..d09fcd68 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# dataherald +# Dataherald Monorepo

Dataherald logo @@ -23,345 +23,27 @@

- Dataherald is a natural language-to-SQL engine built for enterprise-level question answering over structured data. It allows you to set up an API from your database that can answer questions in plain English. You can use Dataherald to: - Allow business users to get insights from the data warehouse without going through a data analyst - Enable Q+A from your production DBs inside your SaaS application - Create a ChatGPT plug-in from your proprietary data +This respository hosts 4 projects total: -This project is undergoing swift development, and as such, the API may be subject to change at any time. - -If you would like to learn more, you can join the Discord or read the docs. - -## Overview - -### Background - -The latest LLMs have gotten remarkably good at writing syntactically correct SQL. However since they lack business context, they often write inaccurate SQL. Our goal with Dataherald is to build the more performant and easy to use NL-to-SQL product for developers. - -### Goals - -Dataherald is built to: - -- Have the highest accuracy and lowest latency possible -- Be easy to set-up and use with major data warehouses -- Enable users to add business context from various sources -- Give developers the tools to fine-tune NL-to-SQL models on their own schema and deploy them in production -- Be LLM provider agnostic - -## Get Started - -The simplest way to set up Dataherald is to use the hosted version. We are rolling this service to select customers. Sign up for the waitlist. - -You can also self-host the engine locally using Docker. By default the engine uses Mongo to store application data. - - -## How to Run Dataherald (with local Mongo) using Docker - -1. Create `.env` file, you can use the `.env.example` file as a guide. You must set these fields for the engine to start. - -``` -cp .env.example .env -``` - -Specifically the following fields must be manually set before the engine is started. - -``` -#OpenAI credentials and model -# mainly used for embedding models and finetunung -OPENAI_API_KEY = -ORG_ID = - -#Encryption key for storing DB connection data in Mongo -ENCRYPT_KEY = - -# the variable that determines how many rows should be returned from a query to the agents, set it to small values to avoid high costs and long response times, default is 50 -UPPER_LIMIT_QUERY_RETURN_ROWS = 50 -# the variable that force the engine to quit if the sql geneation takes more than the time set in this variable, default is None. -DH_ENGINE_TIMEOUT = 150 -``` - -In case you want to use models deployed in Azure OpenAI, you must set the following variables: -``` -AZURE_API_KEY = "xxxxx" -AZURE_OPENAI_API_KEY = "xxxxxx" -API_BASE = "azure_openai_endpoint" -AZURE_OPENAI_ENDPOINT = "azure_openai_endpoint" -AZURE_API_VERSION = "version of the API to use" -LLM_MODEL = "name_of_the_deployment" -``` -In addition, an embedding model will be also used. There must be a deployment created with name "text-embedding-3-large". - -The existence of AZURE_API_KEY as environment variable indicates Azure models must be used. - -Remember to remove comments beside the environment variables. - -While not strictly required, we also strongly suggest you change the MONGO username and password fields as well. - -Follow the next commands to generate an ENCRYPT_KEY and paste it in the .env file like -this `ENCRYPT_KEY = 4Mbe2GYx0Hk94o_f-irVHk1fKkCGAt1R7LLw5wHVghI=` - -``` -# Install the package cryptography in the terminal -pip3 install cryptography - -# Run python in terminal -python3 - -# Import Fernet -from cryptography.fernet import Fernet - -# Generate the key -Fernet.generate_key() -``` - -2. Install and run Docker - -3. Create a Docker network for communication between services. ->We need to set it up externally to enable external clients running on docker to communicate with this app. -Run the following command: -``` -docker network create backendnetwork -``` - -4. Build docker images, create containers and raise them. This will raise the app and mongo container -``` -docker-compose up --build -``` -> You can skip the `--build` if you don't have to rebuild the image due to updates to the dependencies - -5. Check that the containers are running, you should see 2 containers -``` -docker ps -``` -It should look like this: -``` -CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -72aa8df0d589 dataherald-app "uvicorn dataherald.…" 7 seconds ago Up 6 seconds 0.0.0.0:80->80/tcp dataherald-app-1 -6595d145b0d7 mongo:latest "docker-entrypoint.s…" 19 hours ago Up 6 seconds 0.0.0.0:27017->27017/tcp dataherald-mongodb-1 -``` - -6. In your browser visit [http://localhost/docs](http://localhost/docs) - - - -### See Docker App container logs -Once app container is running just execute the next command -``` -docker-compose exec app cat dataherald.log -``` - -### Connect to Docker MongoDB container -Once your mongo container is running you can use any tool (Such as NoSQLBooster) to connect it. -The default values are: -``` -HOST: localhost # inside the docker containers use the host "mongodb" and outside use "localhost" -PORT: 27017 -DB_NAME: dataherald -DB_USERNAME = admin -DB_PASSWORD = admin -``` - -## Connecting to and Querying your SQL Databases -Once the engine is running, you will want to use it by: -1. Connecting to you data warehouses -2. Adding context about the data to the engine -3. Querying the data in natural language - -### Connecting to your data warehouses -We currently support connections to Postgres, DuckDB, BigQuery, ClickHouse, Databricks, Snowflake, MySQL/MariaDB, MS SQL Server, Redshift and AWS Athena. You can create connections to these warehouses through the API or at application start-up using the envars. - -#### Connecting through the API - -You can define a DB connection through a call to the following API endpoint `POST /api/v1/database-connections`. For example: - -``` -curl -X 'POST' \ - '/api/v1/database-connections' \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json' \ - -d '{ - "alias": "my_db_alias", - "use_ssh": false, - "connection_uri": snowflake://:@-//" -}' -``` - -##### Connecting multi-schemas -You can connect many schemas using one db connection if you want to create SQL joins between schemas. -Currently only `BigQuery`, `Snowflake`, `Databricks` and `Postgres` support this feature. -To use multi-schemas instead of sending the `schema` in the `connection_uri` set it in the `schemas` param, like this: - -``` -curl -X 'POST' \ - '/api/v1/database-connections' \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json' \ - -d '{ - "alias": "my_db_alias", - "use_ssh": false, - "connection_uri": snowflake://:@-/", - "schemas": ["schema_1", "schema_2", ...] -}' -``` - -##### Connecting to supported Data warehouses and using SSH -You can find the details on how to connect to the supported data warehouses in the [docs](https://dataherald.readthedocs.io/en/latest/api.create_database_connection.html) - -### Adding Context -Once you have connected to the data warehouse, you can add context to the engine to help improve the accuracy of the generated SQL. Context can currently be added in one of three ways: - -1. Scanning the Database tables and columns -2. Adding verified SQL (golden SQL) -3. Adding string descriptions of the tables and columns -4. Adding database level instructions - -While only the Database scan part is required to start generating SQL, adding verified SQL and string descriptions are also important for the tool to generate accurate SQL. - -#### Scanning the Database -The database scan is used to gather information about the database including table and column names and identifying low cardinality columns and their values to be stored in the context store and used in the prompts to the LLM. -In addition, it retrieves logs, which consist of historical queries associated with each database table. These records are then stored within the query_history collection. The historical queries retrieved encompass data from the past three months and are grouped based on query and user. -The db_connection_id param is the id of the database connection you want to scan, which is returned when you create a database connection. -The ids param is the table_description_id that you want to scan. -You can trigger a scan of a database from the `POST /api/v1/table-descriptions/sync-schemas` endpoint. Example below - - -``` -curl -X 'POST' \ - '/api/v1/table-descriptions/sync-schemas' \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json' \ - -d '{ - "db_connection_id": "db_connection_id", - "ids": ["", "", ...] - }' -``` - -Since the endpoint identifies low cardinality columns (and their values) it can take time to complete. - -#### Get logs per db connection -Once a database was scanned you can use this endpoint to retrieve the tables logs -Set the `db_connection_id` to the id of the database connection you want to retrieve the logs from - -``` -curl -X 'GET' \ - 'http://localhost/api/v1/query-history?db_connection_id=656e52cb4d1fda50cae7b939' \ - -H 'accept: application/json' -``` - -Response example: -``` -[ - { - "id": "656e52cb4d1fda50cae7b939", - "db_connection_id": "656e52cb4d1fda50cae7b939", - "table_name": "table_name", - "query": "select QUERY_TEXT, USER_NAME, count(*) as occurrences from ....", - "user": "user_name", - "occurrences": 1 - } -] -``` - -#### Adding verified SQL - -Adding ground truth Question/SQL pairs is a powerful way to improve the accuracy of the generated SQL. Golden records can be used either to fine-tune the LLM or to augment the prompts to the LLM. - -You can read more about this in the [docs](https://dataherald.readthedocs.io/en/latest/api.golden_sql.html) - -#### Adding string descriptions -In addition to database table_info and golden_sql, you can set descriptions or update the columns per table and column. -Description are used by the agents to determine the relevant columns and tables to the user's question. - -Read more about this in the [docs](https://dataherald.readthedocs.io/en/latest/api.update_table_descriptions.html) - -#### Adding database level instructions - -Database level instructions are passed directly to the engine and can be used to steer the engine to generate SQL that is more in line with your business logic. This can include instructions such as "never use this column in a where clause" or "always use this column in a where clause". - -You can read more about this in the [docs](https://dataherald.readthedocs.io/en/latest/api.add_instructions.html) - - -### Querying the Database in Natural Language -Once you have connected the engine to your data warehouse (and preferably added some context to the store), you can query your data warehouse using the `POST /api/v1/prompts/sql-generations` endpoint. - -``` -curl -X 'POST' \ - '/api/v1/prompts/sql-generations' \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json' \ - -d '{ - "finetuning_id": "string", # specify the finetuning id if you want to use a finetuned model - "low_latency_mode": false, # low latency mode is used to generate SQL faster, but with lower accuracy - "llm_config": { - "llm_name": "gpt-4-turbo-preview", # specify the LLM model you want to use - "api_base": "string" # If you are using open-source LLMs, you can specify the API base. If you are using OpenAI, you can leave this field empty - }, - "evaluate": false, # if you want our engine to evaluate the generated SQL - "sql": "string", # if you want to evaluate a specific SQL pass it here, else remove this field to generate SQL from a question - "metadata": {}, - "prompt": { - "text": "string", # the question you want to ask - "db_connection_id": "string", # the id of the database connection you want to query - "metadata": {} - } - }' -``` - -### Create a natural language response and SQL generation for a question -If you want to create a natural language response and a SQL generation for a question, you can use the `POST /api/v1/prompts/sql-generations/nl-generations` endpoint. - -``` -curl -X 'POST' \ - '/api/v1/responses?run_evaluator=true&sql_response_only=false&generate_csv=false' \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json' \ - -d '{ - "llm_config": { - "llm_name": "gpt-4-turbo-preview", # specify the LLM model you want to use to generate the NL response - "api_base": "string" # If you are using open-source LLMs, you can specify the API base. If you are using OpenAI, you can leave this field empty - }, - "max_rows": 100, # the maximum number of rows you want to use for generating the NL response - "metadata": {}, - "sql_generation": { - "finetuning_id": "string", # specify the finetuning id if you want to use a finetuned model - "low_latency_mode": false, # low latency mode is used to generate SQL faster, but with lower accuracy - "llm_config": { - "llm_name": "gpt-4-turbo-preview", # specify the LLM model you want to use to generate the SQL - "api_base": "string" # If you are using open-source LLMs, you can specify the API base. If you are using OpenAI, you can leave this field empty - }, - "evaluate": false, # if you want our engine to evaluate the generated SQL - "sql": "string", # if you want to evaluate a specific SQL pass it here, else remove this field to generate SQL from a question - "metadata": {} - "prompt": { - "text": "string", # the question you want to ask - "db_connection_id": "string", # the id of the database connection you want to query - "metadata": {} - } - }, -}' -``` +1. Admin Console: The front-end component for dataherald. It requires Enterprise and Engine both to be running in order to work. -### How to migrate data between versions -Our engine is under ongoing development and in order to support the latest features, we provide scripts to migrate the data from the previous version to the latest version. You can find all of the scripts in the `dataherald.scripts` module. To run the migration script, execute the following command: +2. Engine: The core component for language-to-SQL engine. If you just want to use the language-to-SQL API, then only the Engine is needed. -``` -docker-compose exec app python3 -m dataherald.scripts.migrate_v100_to_v101 -``` +3. Enterprise: A wrapper for the Engine component with business logic. It adds authentication, organizations, users, usage based payment, and other logic on top of the Engine. It requires Engine to be running in order to work -## Replacing core modules -The Dataherald engine is made up of replaceable modules. Each of these can be replaced with a different implementation that extends the base class. Some of the main modules are: +3. Slackbot: A slack bot that allows interactions to the language-to-SQL engine via Slack channel messages. It requires Enterprise and Engine both to be running in order to work. -1. SQL Generator -- The module that generates SQL from a given natural language question. -2. Vector Store -- The Vector DB used to store context data such as sample SQL queries -3. DB -- The DB that persists application logic. By default this is Mongo. -4. Evaluator -- A module which evaluates accuracy of the generated SQL and assigns a score. +Each project is deployable via docker image. If you want to connect different projects together, you will need to setup the environment variables with the project url. -In some instances we have already included multiple implementations for testing and benchmarking. +For more details about each individual project, please check the their `README.md` files for more information. ## Contributing As an open-source project in a rapidly developing field, we are open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation. -For detailed information on how to contribute, see [here](CONTRIBUTING.md). +For detailed information on how to contribute, see [here](CONTRIBUTING.md). \ No newline at end of file diff --git a/docs/.DS_Store b/docs/.DS_Store deleted file mode 100644 index 2814135f..00000000 Binary files a/docs/.DS_Store and /dev/null differ diff --git a/.dockerignore b/services/engine/.dockerignore similarity index 100% rename from .dockerignore rename to services/engine/.dockerignore diff --git a/.env.example b/services/engine/.env.example similarity index 100% rename from .env.example rename to services/engine/.env.example diff --git a/.gitignore b/services/engine/.gitignore similarity index 100% rename from .gitignore rename to services/engine/.gitignore diff --git a/.readthedocs.yaml b/services/engine/.readthedocs.yaml similarity index 100% rename from .readthedocs.yaml rename to services/engine/.readthedocs.yaml diff --git a/.test.env b/services/engine/.test.env similarity index 100% rename from .test.env rename to services/engine/.test.env diff --git a/CONTRIBUTING.md b/services/engine/CONTRIBUTING.md similarity index 100% rename from CONTRIBUTING.md rename to services/engine/CONTRIBUTING.md diff --git a/Dockerfile b/services/engine/Dockerfile similarity index 100% rename from Dockerfile rename to services/engine/Dockerfile diff --git a/LICENSE b/services/engine/LICENSE similarity index 100% rename from LICENSE rename to services/engine/LICENSE diff --git a/services/engine/README.md b/services/engine/README.md new file mode 100644 index 00000000..49eedeed --- /dev/null +++ b/services/engine/README.md @@ -0,0 +1,367 @@ +# dataherald + +

+ Dataherald logo +

+ +

+ Query your structured data in natural language.
+

+ +

+ + Discord + | + + License + | + + Docs + | + + Homepage + +

+ + +Dataherald is a natural language-to-SQL engine built for enterprise-level question answering over structured data. It allows you to set up an API from your database that can answer questions in plain English. You can use Dataherald to: + +- Allow business users to get insights from the data warehouse without going through a data analyst +- Enable Q+A from your production DBs inside your SaaS application +- Create a ChatGPT plug-in from your proprietary data + + +This project is undergoing swift development, and as such, the API may be subject to change at any time. + +If you would like to learn more, you can join the Discord or read the docs. + +## Overview + +### Background + +The latest LLMs have gotten remarkably good at writing syntactically correct SQL. However since they lack business context, they often write inaccurate SQL. Our goal with Dataherald is to build the more performant and easy to use NL-to-SQL product for developers. + +### Goals + +Dataherald is built to: + +- Have the highest accuracy and lowest latency possible +- Be easy to set-up and use with major data warehouses +- Enable users to add business context from various sources +- Give developers the tools to fine-tune NL-to-SQL models on their own schema and deploy them in production +- Be LLM provider agnostic + +## Get Started + +The simplest way to set up Dataherald is to use the hosted version. We are rolling this service to select customers. Sign up for the waitlist. + +You can also self-host the engine locally using Docker. By default the engine uses Mongo to store application data. + + +## How to Run Dataherald (with local Mongo) using Docker + +1. Create `.env` file, you can use the `.env.example` file as a guide. You must set these fields for the engine to start. + +``` +cp .env.example .env +``` + +Specifically the following fields must be manually set before the engine is started. + +``` +#OpenAI credentials and model +# mainly used for embedding models and finetunung +OPENAI_API_KEY = +ORG_ID = + +#Encryption key for storing DB connection data in Mongo +ENCRYPT_KEY = + +# the variable that determines how many rows should be returned from a query to the agents, set it to small values to avoid high costs and long response times, default is 50 +UPPER_LIMIT_QUERY_RETURN_ROWS = 50 +# the variable that force the engine to quit if the sql geneation takes more than the time set in this variable, default is None. +DH_ENGINE_TIMEOUT = 150 +``` + +In case you want to use models deployed in Azure OpenAI, you must set the following variables: +``` +AZURE_API_KEY = "xxxxx" +AZURE_OPENAI_API_KEY = "xxxxxx" +API_BASE = "azure_openai_endpoint" +AZURE_OPENAI_ENDPOINT = "azure_openai_endpoint" +AZURE_API_VERSION = "version of the API to use" +LLM_MODEL = "name_of_the_deployment" +``` +In addition, an embedding model will be also used. There must be a deployment created with name "text-embedding-3-large". + +The existence of AZURE_API_KEY as environment variable indicates Azure models must be used. + +Remember to remove comments beside the environment variables. + +While not strictly required, we also strongly suggest you change the MONGO username and password fields as well. + +Follow the next commands to generate an ENCRYPT_KEY and paste it in the .env file like +this `ENCRYPT_KEY = 4Mbe2GYx0Hk94o_f-irVHk1fKkCGAt1R7LLw5wHVghI=` + +``` +# Install the package cryptography in the terminal +pip3 install cryptography + +# Run python in terminal +python3 + +# Import Fernet +from cryptography.fernet import Fernet + +# Generate the key +Fernet.generate_key() +``` + +2. Install and run Docker + +3. Create a Docker network for communication between services. +>We need to set it up externally to enable external clients running on docker to communicate with this app. +Run the following command: +``` +docker network create backendnetwork +``` + +4. Build docker images, create containers and raise them. This will raise the app and mongo container +``` +docker-compose up --build +``` +> You can skip the `--build` if you don't have to rebuild the image due to updates to the dependencies + +5. Check that the containers are running, you should see 2 containers +``` +docker ps +``` +It should look like this: +``` +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +72aa8df0d589 dataherald-app "uvicorn dataherald.…" 7 seconds ago Up 6 seconds 0.0.0.0:80->80/tcp dataherald-app-1 +6595d145b0d7 mongo:latest "docker-entrypoint.s…" 19 hours ago Up 6 seconds 0.0.0.0:27017->27017/tcp dataherald-mongodb-1 +``` + +6. In your browser visit [http://localhost/docs](http://localhost/docs) + + + +### See Docker App container logs +Once app container is running just execute the next command +``` +docker-compose exec app cat dataherald.log +``` + +### Connect to Docker MongoDB container +Once your mongo container is running you can use any tool (Such as NoSQLBooster) to connect it. +The default values are: +``` +HOST: localhost # inside the docker containers use the host "mongodb" and outside use "localhost" +PORT: 27017 +DB_NAME: dataherald +DB_USERNAME = admin +DB_PASSWORD = admin +``` + +## Connecting to and Querying your SQL Databases +Once the engine is running, you will want to use it by: +1. Connecting to you data warehouses +2. Adding context about the data to the engine +3. Querying the data in natural language + +### Connecting to your data warehouses +We currently support connections to Postgres, DuckDB, BigQuery, ClickHouse, Databricks, Snowflake, MySQL/MariaDB, MS SQL Server, Redshift and AWS Athena. You can create connections to these warehouses through the API or at application start-up using the envars. + +#### Connecting through the API + +You can define a DB connection through a call to the following API endpoint `POST /api/v1/database-connections`. For example: + +``` +curl -X 'POST' \ + '/api/v1/database-connections' \ + -H 'accept: application/json' \ + -H 'Content-Type: application/json' \ + -d '{ + "alias": "my_db_alias", + "use_ssh": false, + "connection_uri": snowflake://:@-//" +}' +``` + +##### Connecting multi-schemas +You can connect many schemas using one db connection if you want to create SQL joins between schemas. +Currently only `BigQuery`, `Snowflake`, `Databricks` and `Postgres` support this feature. +To use multi-schemas instead of sending the `schema` in the `connection_uri` set it in the `schemas` param, like this: + +``` +curl -X 'POST' \ + '/api/v1/database-connections' \ + -H 'accept: application/json' \ + -H 'Content-Type: application/json' \ + -d '{ + "alias": "my_db_alias", + "use_ssh": false, + "connection_uri": snowflake://:@-/", + "schemas": ["schema_1", "schema_2", ...] +}' +``` + +##### Connecting to supported Data warehouses and using SSH +You can find the details on how to connect to the supported data warehouses in the [docs](https://dataherald.readthedocs.io/en/latest/api.create_database_connection.html) + +### Adding Context +Once you have connected to the data warehouse, you can add context to the engine to help improve the accuracy of the generated SQL. Context can currently be added in one of three ways: + +1. Scanning the Database tables and columns +2. Adding verified SQL (golden SQL) +3. Adding string descriptions of the tables and columns +4. Adding database level instructions + +While only the Database scan part is required to start generating SQL, adding verified SQL and string descriptions are also important for the tool to generate accurate SQL. + +#### Scanning the Database +The database scan is used to gather information about the database including table and column names and identifying low cardinality columns and their values to be stored in the context store and used in the prompts to the LLM. +In addition, it retrieves logs, which consist of historical queries associated with each database table. These records are then stored within the query_history collection. The historical queries retrieved encompass data from the past three months and are grouped based on query and user. +The db_connection_id param is the id of the database connection you want to scan, which is returned when you create a database connection. +The ids param is the table_description_id that you want to scan. +You can trigger a scan of a database from the `POST /api/v1/table-descriptions/sync-schemas` endpoint. Example below + + +``` +curl -X 'POST' \ + '/api/v1/table-descriptions/sync-schemas' \ + -H 'accept: application/json' \ + -H 'Content-Type: application/json' \ + -d '{ + "db_connection_id": "db_connection_id", + "ids": ["", "", ...] + }' +``` + +Since the endpoint identifies low cardinality columns (and their values) it can take time to complete. + +#### Get logs per db connection +Once a database was scanned you can use this endpoint to retrieve the tables logs +Set the `db_connection_id` to the id of the database connection you want to retrieve the logs from + +``` +curl -X 'GET' \ + 'http://localhost/api/v1/query-history?db_connection_id=656e52cb4d1fda50cae7b939' \ + -H 'accept: application/json' +``` + +Response example: +``` +[ + { + "id": "656e52cb4d1fda50cae7b939", + "db_connection_id": "656e52cb4d1fda50cae7b939", + "table_name": "table_name", + "query": "select QUERY_TEXT, USER_NAME, count(*) as occurrences from ....", + "user": "user_name", + "occurrences": 1 + } +] +``` + +#### Adding verified SQL + +Adding ground truth Question/SQL pairs is a powerful way to improve the accuracy of the generated SQL. Golden records can be used either to fine-tune the LLM or to augment the prompts to the LLM. + +You can read more about this in the [docs](https://dataherald.readthedocs.io/en/latest/api.golden_sql.html) + +#### Adding string descriptions +In addition to database table_info and golden_sql, you can set descriptions or update the columns per table and column. +Description are used by the agents to determine the relevant columns and tables to the user's question. + +Read more about this in the [docs](https://dataherald.readthedocs.io/en/latest/api.update_table_descriptions.html) + +#### Adding database level instructions + +Database level instructions are passed directly to the engine and can be used to steer the engine to generate SQL that is more in line with your business logic. This can include instructions such as "never use this column in a where clause" or "always use this column in a where clause". + +You can read more about this in the [docs](https://dataherald.readthedocs.io/en/latest/api.add_instructions.html) + + +### Querying the Database in Natural Language +Once you have connected the engine to your data warehouse (and preferably added some context to the store), you can query your data warehouse using the `POST /api/v1/prompts/sql-generations` endpoint. + +``` +curl -X 'POST' \ + '/api/v1/prompts/sql-generations' \ + -H 'accept: application/json' \ + -H 'Content-Type: application/json' \ + -d '{ + "finetuning_id": "string", # specify the finetuning id if you want to use a finetuned model + "low_latency_mode": false, # low latency mode is used to generate SQL faster, but with lower accuracy + "llm_config": { + "llm_name": "gpt-4-turbo-preview", # specify the LLM model you want to use + "api_base": "string" # If you are using open-source LLMs, you can specify the API base. If you are using OpenAI, you can leave this field empty + }, + "evaluate": false, # if you want our engine to evaluate the generated SQL + "sql": "string", # if you want to evaluate a specific SQL pass it here, else remove this field to generate SQL from a question + "metadata": {}, + "prompt": { + "text": "string", # the question you want to ask + "db_connection_id": "string", # the id of the database connection you want to query + "metadata": {} + } + }' +``` + +### Create a natural language response and SQL generation for a question +If you want to create a natural language response and a SQL generation for a question, you can use the `POST /api/v1/prompts/sql-generations/nl-generations` endpoint. + +``` +curl -X 'POST' \ + '/api/v1/responses?run_evaluator=true&sql_response_only=false&generate_csv=false' \ + -H 'accept: application/json' \ + -H 'Content-Type: application/json' \ + -d '{ + "llm_config": { + "llm_name": "gpt-4-turbo-preview", # specify the LLM model you want to use to generate the NL response + "api_base": "string" # If you are using open-source LLMs, you can specify the API base. If you are using OpenAI, you can leave this field empty + }, + "max_rows": 100, # the maximum number of rows you want to use for generating the NL response + "metadata": {}, + "sql_generation": { + "finetuning_id": "string", # specify the finetuning id if you want to use a finetuned model + "low_latency_mode": false, # low latency mode is used to generate SQL faster, but with lower accuracy + "llm_config": { + "llm_name": "gpt-4-turbo-preview", # specify the LLM model you want to use to generate the SQL + "api_base": "string" # If you are using open-source LLMs, you can specify the API base. If you are using OpenAI, you can leave this field empty + }, + "evaluate": false, # if you want our engine to evaluate the generated SQL + "sql": "string", # if you want to evaluate a specific SQL pass it here, else remove this field to generate SQL from a question + "metadata": {} + "prompt": { + "text": "string", # the question you want to ask + "db_connection_id": "string", # the id of the database connection you want to query + "metadata": {} + } + }, +}' +``` + +### How to migrate data between versions +Our engine is under ongoing development and in order to support the latest features, we provide scripts to migrate the data from the previous version to the latest version. You can find all of the scripts in the `dataherald.scripts` module. To run the migration script, execute the following command: + +``` +docker-compose exec app python3 -m dataherald.scripts.migrate_v100_to_v101 +``` + +## Replacing core modules +The Dataherald engine is made up of replaceable modules. Each of these can be replaced with a different implementation that extends the base class. Some of the main modules are: + +1. SQL Generator -- The module that generates SQL from a given natural language question. +2. Vector Store -- The Vector DB used to store context data such as sample SQL queries +3. DB -- The DB that persists application logic. By default this is Mongo. +4. Evaluator -- A module which evaluates accuracy of the generated SQL and assigns a score. + +In some instances we have already included multiple implementations for testing and benchmarking. + +## Contributing +As an open-source project in a rapidly developing field, we are open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation. + +For detailed information on how to contribute, see [here](CONTRIBUTING.md). diff --git a/dataherald/__init__.py b/services/engine/dataherald/__init__.py similarity index 100% rename from dataherald/__init__.py rename to services/engine/dataherald/__init__.py diff --git a/dataherald/api/__init__.py b/services/engine/dataherald/api/__init__.py similarity index 100% rename from dataherald/api/__init__.py rename to services/engine/dataherald/api/__init__.py diff --git a/dataherald/api/fastapi.py b/services/engine/dataherald/api/fastapi.py similarity index 100% rename from dataherald/api/fastapi.py rename to services/engine/dataherald/api/fastapi.py diff --git a/dataherald/api/types/__init__.py b/services/engine/dataherald/api/types/__init__.py similarity index 100% rename from dataherald/api/types/__init__.py rename to services/engine/dataherald/api/types/__init__.py diff --git a/dataherald/api/types/query.py b/services/engine/dataherald/api/types/query.py similarity index 100% rename from dataherald/api/types/query.py rename to services/engine/dataherald/api/types/query.py diff --git a/dataherald/api/types/requests.py b/services/engine/dataherald/api/types/requests.py similarity index 100% rename from dataherald/api/types/requests.py rename to services/engine/dataherald/api/types/requests.py diff --git a/dataherald/api/types/responses.py b/services/engine/dataherald/api/types/responses.py similarity index 100% rename from dataherald/api/types/responses.py rename to services/engine/dataherald/api/types/responses.py diff --git a/dataherald/app.py b/services/engine/dataherald/app.py similarity index 100% rename from dataherald/app.py rename to services/engine/dataherald/app.py diff --git a/dataherald/config.py b/services/engine/dataherald/config.py similarity index 100% rename from dataherald/config.py rename to services/engine/dataherald/config.py diff --git a/dataherald/context_store/__init__.py b/services/engine/dataherald/context_store/__init__.py similarity index 100% rename from dataherald/context_store/__init__.py rename to services/engine/dataherald/context_store/__init__.py diff --git a/dataherald/context_store/default.py b/services/engine/dataherald/context_store/default.py similarity index 100% rename from dataherald/context_store/default.py rename to services/engine/dataherald/context_store/default.py diff --git a/dataherald/db/__init__.py b/services/engine/dataherald/db/__init__.py similarity index 100% rename from dataherald/db/__init__.py rename to services/engine/dataherald/db/__init__.py diff --git a/dataherald/db/mongo.py b/services/engine/dataherald/db/mongo.py similarity index 100% rename from dataherald/db/mongo.py rename to services/engine/dataherald/db/mongo.py diff --git a/dataherald/db_scanner/__init__.py b/services/engine/dataherald/db_scanner/__init__.py similarity index 100% rename from dataherald/db_scanner/__init__.py rename to services/engine/dataherald/db_scanner/__init__.py diff --git a/dataherald/db_scanner/models/__init__.py b/services/engine/dataherald/db_scanner/models/__init__.py similarity index 100% rename from dataherald/db_scanner/models/__init__.py rename to services/engine/dataherald/db_scanner/models/__init__.py diff --git a/dataherald/db_scanner/models/types.py b/services/engine/dataherald/db_scanner/models/types.py similarity index 100% rename from dataherald/db_scanner/models/types.py rename to services/engine/dataherald/db_scanner/models/types.py diff --git a/dataherald/db_scanner/repository/__init__.py b/services/engine/dataherald/db_scanner/repository/__init__.py similarity index 100% rename from dataherald/db_scanner/repository/__init__.py rename to services/engine/dataherald/db_scanner/repository/__init__.py diff --git a/dataherald/db_scanner/repository/base.py b/services/engine/dataherald/db_scanner/repository/base.py similarity index 100% rename from dataherald/db_scanner/repository/base.py rename to services/engine/dataherald/db_scanner/repository/base.py diff --git a/dataherald/db_scanner/repository/query_history.py b/services/engine/dataherald/db_scanner/repository/query_history.py similarity index 100% rename from dataherald/db_scanner/repository/query_history.py rename to services/engine/dataherald/db_scanner/repository/query_history.py diff --git a/dataherald/db_scanner/services/__init__.py b/services/engine/dataherald/db_scanner/services/__init__.py similarity index 100% rename from dataherald/db_scanner/services/__init__.py rename to services/engine/dataherald/db_scanner/services/__init__.py diff --git a/dataherald/db_scanner/services/abstract_scanner.py b/services/engine/dataherald/db_scanner/services/abstract_scanner.py similarity index 100% rename from dataherald/db_scanner/services/abstract_scanner.py rename to services/engine/dataherald/db_scanner/services/abstract_scanner.py diff --git a/dataherald/db_scanner/services/base_scanner.py b/services/engine/dataherald/db_scanner/services/base_scanner.py similarity index 100% rename from dataherald/db_scanner/services/base_scanner.py rename to services/engine/dataherald/db_scanner/services/base_scanner.py diff --git a/dataherald/db_scanner/services/big_query_scanner.py b/services/engine/dataherald/db_scanner/services/big_query_scanner.py similarity index 100% rename from dataherald/db_scanner/services/big_query_scanner.py rename to services/engine/dataherald/db_scanner/services/big_query_scanner.py diff --git a/dataherald/db_scanner/services/click_house_scanner.py b/services/engine/dataherald/db_scanner/services/click_house_scanner.py similarity index 100% rename from dataherald/db_scanner/services/click_house_scanner.py rename to services/engine/dataherald/db_scanner/services/click_house_scanner.py diff --git a/dataherald/db_scanner/services/postgre_sql_scanner.py b/services/engine/dataherald/db_scanner/services/postgre_sql_scanner.py similarity index 100% rename from dataherald/db_scanner/services/postgre_sql_scanner.py rename to services/engine/dataherald/db_scanner/services/postgre_sql_scanner.py diff --git a/dataherald/db_scanner/services/redshift_scanner.py b/services/engine/dataherald/db_scanner/services/redshift_scanner.py similarity index 100% rename from dataherald/db_scanner/services/redshift_scanner.py rename to services/engine/dataherald/db_scanner/services/redshift_scanner.py diff --git a/dataherald/db_scanner/services/snowflake_scanner.py b/services/engine/dataherald/db_scanner/services/snowflake_scanner.py similarity index 100% rename from dataherald/db_scanner/services/snowflake_scanner.py rename to services/engine/dataherald/db_scanner/services/snowflake_scanner.py diff --git a/dataherald/db_scanner/services/sql_server_scanner.py b/services/engine/dataherald/db_scanner/services/sql_server_scanner.py similarity index 100% rename from dataherald/db_scanner/services/sql_server_scanner.py rename to services/engine/dataherald/db_scanner/services/sql_server_scanner.py diff --git a/dataherald/db_scanner/sqlalchemy.py b/services/engine/dataherald/db_scanner/sqlalchemy.py similarity index 100% rename from dataherald/db_scanner/sqlalchemy.py rename to services/engine/dataherald/db_scanner/sqlalchemy.py diff --git a/dataherald/eval/__init__.py b/services/engine/dataherald/eval/__init__.py similarity index 100% rename from dataherald/eval/__init__.py rename to services/engine/dataherald/eval/__init__.py diff --git a/dataherald/eval/eval_agent.py b/services/engine/dataherald/eval/eval_agent.py similarity index 100% rename from dataherald/eval/eval_agent.py rename to services/engine/dataherald/eval/eval_agent.py diff --git a/dataherald/eval/simple_evaluator.py b/services/engine/dataherald/eval/simple_evaluator.py similarity index 100% rename from dataherald/eval/simple_evaluator.py rename to services/engine/dataherald/eval/simple_evaluator.py diff --git a/dataherald/finetuning/__init__.py b/services/engine/dataherald/finetuning/__init__.py similarity index 100% rename from dataherald/finetuning/__init__.py rename to services/engine/dataherald/finetuning/__init__.py diff --git a/dataherald/finetuning/openai_finetuning.py b/services/engine/dataherald/finetuning/openai_finetuning.py similarity index 100% rename from dataherald/finetuning/openai_finetuning.py rename to services/engine/dataherald/finetuning/openai_finetuning.py diff --git a/dataherald/model/__init__.py b/services/engine/dataherald/model/__init__.py similarity index 100% rename from dataherald/model/__init__.py rename to services/engine/dataherald/model/__init__.py diff --git a/dataherald/model/base_model.py b/services/engine/dataherald/model/base_model.py similarity index 100% rename from dataherald/model/base_model.py rename to services/engine/dataherald/model/base_model.py diff --git a/dataherald/model/chat_model.py b/services/engine/dataherald/model/chat_model.py similarity index 100% rename from dataherald/model/chat_model.py rename to services/engine/dataherald/model/chat_model.py diff --git a/dataherald/repositories/__init__.py b/services/engine/dataherald/repositories/__init__.py similarity index 100% rename from dataherald/repositories/__init__.py rename to services/engine/dataherald/repositories/__init__.py diff --git a/dataherald/repositories/database_connections.py b/services/engine/dataherald/repositories/database_connections.py similarity index 100% rename from dataherald/repositories/database_connections.py rename to services/engine/dataherald/repositories/database_connections.py diff --git a/dataherald/repositories/finetunings.py b/services/engine/dataherald/repositories/finetunings.py similarity index 100% rename from dataherald/repositories/finetunings.py rename to services/engine/dataherald/repositories/finetunings.py diff --git a/dataherald/repositories/golden_sqls.py b/services/engine/dataherald/repositories/golden_sqls.py similarity index 100% rename from dataherald/repositories/golden_sqls.py rename to services/engine/dataherald/repositories/golden_sqls.py diff --git a/dataherald/repositories/instructions.py b/services/engine/dataherald/repositories/instructions.py similarity index 100% rename from dataherald/repositories/instructions.py rename to services/engine/dataherald/repositories/instructions.py diff --git a/dataherald/repositories/nl_generations.py b/services/engine/dataherald/repositories/nl_generations.py similarity index 100% rename from dataherald/repositories/nl_generations.py rename to services/engine/dataherald/repositories/nl_generations.py diff --git a/dataherald/repositories/prompts.py b/services/engine/dataherald/repositories/prompts.py similarity index 100% rename from dataherald/repositories/prompts.py rename to services/engine/dataherald/repositories/prompts.py diff --git a/dataherald/repositories/sql_generations.py b/services/engine/dataherald/repositories/sql_generations.py similarity index 100% rename from dataherald/repositories/sql_generations.py rename to services/engine/dataherald/repositories/sql_generations.py diff --git a/dataherald/scripts/__init__.py b/services/engine/dataherald/scripts/__init__.py similarity index 100% rename from dataherald/scripts/__init__.py rename to services/engine/dataherald/scripts/__init__.py diff --git a/dataherald/scripts/delete_and_populate_golden_records.py b/services/engine/dataherald/scripts/delete_and_populate_golden_records.py similarity index 100% rename from dataherald/scripts/delete_and_populate_golden_records.py rename to services/engine/dataherald/scripts/delete_and_populate_golden_records.py diff --git a/dataherald/scripts/migrate_v001_to_v002.py b/services/engine/dataherald/scripts/migrate_v001_to_v002.py similarity index 100% rename from dataherald/scripts/migrate_v001_to_v002.py rename to services/engine/dataherald/scripts/migrate_v001_to_v002.py diff --git a/dataherald/scripts/migrate_v002_to_v003.py b/services/engine/dataherald/scripts/migrate_v002_to_v003.py similarity index 100% rename from dataherald/scripts/migrate_v002_to_v003.py rename to services/engine/dataherald/scripts/migrate_v002_to_v003.py diff --git a/dataherald/scripts/migrate_v003_to_v004.py b/services/engine/dataherald/scripts/migrate_v003_to_v004.py similarity index 100% rename from dataherald/scripts/migrate_v003_to_v004.py rename to services/engine/dataherald/scripts/migrate_v003_to_v004.py diff --git a/dataherald/scripts/migrate_v004_to_v005.py b/services/engine/dataherald/scripts/migrate_v004_to_v005.py similarity index 100% rename from dataherald/scripts/migrate_v004_to_v005.py rename to services/engine/dataherald/scripts/migrate_v004_to_v005.py diff --git a/dataherald/scripts/migrate_v006_to_v100.py b/services/engine/dataherald/scripts/migrate_v006_to_v100.py similarity index 100% rename from dataherald/scripts/migrate_v006_to_v100.py rename to services/engine/dataherald/scripts/migrate_v006_to_v100.py diff --git a/dataherald/scripts/migrate_v100_to_v101.py b/services/engine/dataherald/scripts/migrate_v100_to_v101.py similarity index 100% rename from dataherald/scripts/migrate_v100_to_v101.py rename to services/engine/dataherald/scripts/migrate_v100_to_v101.py diff --git a/dataherald/scripts/populate_dialect_db_connection.py b/services/engine/dataherald/scripts/populate_dialect_db_connection.py similarity index 100% rename from dataherald/scripts/populate_dialect_db_connection.py rename to services/engine/dataherald/scripts/populate_dialect_db_connection.py diff --git a/dataherald/server/__init__.py b/services/engine/dataherald/server/__init__.py similarity index 100% rename from dataherald/server/__init__.py rename to services/engine/dataherald/server/__init__.py diff --git a/dataherald/server/fastapi/__init__.py b/services/engine/dataherald/server/fastapi/__init__.py similarity index 100% rename from dataherald/server/fastapi/__init__.py rename to services/engine/dataherald/server/fastapi/__init__.py diff --git a/dataherald/services/__init__.py b/services/engine/dataherald/services/__init__.py similarity index 100% rename from dataherald/services/__init__.py rename to services/engine/dataherald/services/__init__.py diff --git a/dataherald/services/nl_generations.py b/services/engine/dataherald/services/nl_generations.py similarity index 100% rename from dataherald/services/nl_generations.py rename to services/engine/dataherald/services/nl_generations.py diff --git a/dataherald/services/prompts.py b/services/engine/dataherald/services/prompts.py similarity index 100% rename from dataherald/services/prompts.py rename to services/engine/dataherald/services/prompts.py diff --git a/dataherald/services/sql_generations.py b/services/engine/dataherald/services/sql_generations.py similarity index 100% rename from dataherald/services/sql_generations.py rename to services/engine/dataherald/services/sql_generations.py diff --git a/dataherald/smart_cache/__init__.py b/services/engine/dataherald/smart_cache/__init__.py similarity index 100% rename from dataherald/smart_cache/__init__.py rename to services/engine/dataherald/smart_cache/__init__.py diff --git a/dataherald/sql_database/__init__.py b/services/engine/dataherald/sql_database/__init__.py similarity index 100% rename from dataherald/sql_database/__init__.py rename to services/engine/dataherald/sql_database/__init__.py diff --git a/dataherald/sql_database/base.py b/services/engine/dataherald/sql_database/base.py similarity index 100% rename from dataherald/sql_database/base.py rename to services/engine/dataherald/sql_database/base.py diff --git a/dataherald/sql_database/models/__init__.py b/services/engine/dataherald/sql_database/models/__init__.py similarity index 100% rename from dataherald/sql_database/models/__init__.py rename to services/engine/dataherald/sql_database/models/__init__.py diff --git a/dataherald/sql_database/models/types.py b/services/engine/dataherald/sql_database/models/types.py similarity index 100% rename from dataherald/sql_database/models/types.py rename to services/engine/dataherald/sql_database/models/types.py diff --git a/dataherald/sql_database/services/__init__.py b/services/engine/dataherald/sql_database/services/__init__.py similarity index 100% rename from dataherald/sql_database/services/__init__.py rename to services/engine/dataherald/sql_database/services/__init__.py diff --git a/dataherald/sql_database/services/database_connection.py b/services/engine/dataherald/sql_database/services/database_connection.py similarity index 100% rename from dataherald/sql_database/services/database_connection.py rename to services/engine/dataherald/sql_database/services/database_connection.py diff --git a/dataherald/sql_generator/__init__.py b/services/engine/dataherald/sql_generator/__init__.py similarity index 100% rename from dataherald/sql_generator/__init__.py rename to services/engine/dataherald/sql_generator/__init__.py diff --git a/dataherald/sql_generator/adaptive_agent_executor.py b/services/engine/dataherald/sql_generator/adaptive_agent_executor.py similarity index 100% rename from dataherald/sql_generator/adaptive_agent_executor.py rename to services/engine/dataherald/sql_generator/adaptive_agent_executor.py diff --git a/dataherald/sql_generator/create_sql_query_status.py b/services/engine/dataherald/sql_generator/create_sql_query_status.py similarity index 100% rename from dataherald/sql_generator/create_sql_query_status.py rename to services/engine/dataherald/sql_generator/create_sql_query_status.py diff --git a/dataherald/sql_generator/dataherald_finetuning_agent.py b/services/engine/dataherald/sql_generator/dataherald_finetuning_agent.py similarity index 100% rename from dataherald/sql_generator/dataherald_finetuning_agent.py rename to services/engine/dataherald/sql_generator/dataherald_finetuning_agent.py diff --git a/dataherald/sql_generator/dataherald_sqlagent.py b/services/engine/dataherald/sql_generator/dataherald_sqlagent.py similarity index 100% rename from dataherald/sql_generator/dataherald_sqlagent.py rename to services/engine/dataherald/sql_generator/dataherald_sqlagent.py diff --git a/dataherald/sql_generator/generates_nl_answer.py b/services/engine/dataherald/sql_generator/generates_nl_answer.py similarity index 100% rename from dataherald/sql_generator/generates_nl_answer.py rename to services/engine/dataherald/sql_generator/generates_nl_answer.py diff --git a/dataherald/tests/__init__.py b/services/engine/dataherald/tests/__init__.py similarity index 100% rename from dataherald/tests/__init__.py rename to services/engine/dataherald/tests/__init__.py diff --git a/dataherald/tests/conftest.py b/services/engine/dataherald/tests/conftest.py similarity index 100% rename from dataherald/tests/conftest.py rename to services/engine/dataherald/tests/conftest.py diff --git a/dataherald/tests/db/__init__.py b/services/engine/dataherald/tests/db/__init__.py similarity index 100% rename from dataherald/tests/db/__init__.py rename to services/engine/dataherald/tests/db/__init__.py diff --git a/dataherald/tests/db/test_db.py b/services/engine/dataherald/tests/db/test_db.py similarity index 100% rename from dataherald/tests/db/test_db.py rename to services/engine/dataherald/tests/db/test_db.py diff --git a/dataherald/tests/db_scanner/__init__.py b/services/engine/dataherald/tests/db_scanner/__init__.py similarity index 100% rename from dataherald/tests/db_scanner/__init__.py rename to services/engine/dataherald/tests/db_scanner/__init__.py diff --git a/dataherald/tests/db_scanner/repository/__init__.py b/services/engine/dataherald/tests/db_scanner/repository/__init__.py similarity index 100% rename from dataherald/tests/db_scanner/repository/__init__.py rename to services/engine/dataherald/tests/db_scanner/repository/__init__.py diff --git a/dataherald/tests/db_scanner/repository/test_base.py b/services/engine/dataherald/tests/db_scanner/repository/test_base.py similarity index 100% rename from dataherald/tests/db_scanner/repository/test_base.py rename to services/engine/dataherald/tests/db_scanner/repository/test_base.py diff --git a/dataherald/tests/db_scanner/test_sqlalchemy.py b/services/engine/dataherald/tests/db_scanner/test_sqlalchemy.py similarity index 100% rename from dataherald/tests/db_scanner/test_sqlalchemy.py rename to services/engine/dataherald/tests/db_scanner/test_sqlalchemy.py diff --git a/dataherald/tests/evaluator/__init__.py b/services/engine/dataherald/tests/evaluator/__init__.py similarity index 100% rename from dataherald/tests/evaluator/__init__.py rename to services/engine/dataherald/tests/evaluator/__init__.py diff --git a/dataherald/tests/evaluator/test_eval.py b/services/engine/dataherald/tests/evaluator/test_eval.py similarity index 100% rename from dataherald/tests/evaluator/test_eval.py rename to services/engine/dataherald/tests/evaluator/test_eval.py diff --git a/dataherald/tests/sql_generator/__init__.py b/services/engine/dataherald/tests/sql_generator/__init__.py similarity index 100% rename from dataherald/tests/sql_generator/__init__.py rename to services/engine/dataherald/tests/sql_generator/__init__.py diff --git a/dataherald/tests/sql_generator/test_generator.py b/services/engine/dataherald/tests/sql_generator/test_generator.py similarity index 100% rename from dataherald/tests/sql_generator/test_generator.py rename to services/engine/dataherald/tests/sql_generator/test_generator.py diff --git a/dataherald/tests/test_api.py b/services/engine/dataherald/tests/test_api.py similarity index 100% rename from dataherald/tests/test_api.py rename to services/engine/dataherald/tests/test_api.py diff --git a/dataherald/tests/vector_store/__init__.py b/services/engine/dataherald/tests/vector_store/__init__.py similarity index 100% rename from dataherald/tests/vector_store/__init__.py rename to services/engine/dataherald/tests/vector_store/__init__.py diff --git a/dataherald/tests/vector_store/test_vector_store.py b/services/engine/dataherald/tests/vector_store/test_vector_store.py similarity index 100% rename from dataherald/tests/vector_store/test_vector_store.py rename to services/engine/dataherald/tests/vector_store/test_vector_store.py diff --git a/dataherald/types.py b/services/engine/dataherald/types.py similarity index 100% rename from dataherald/types.py rename to services/engine/dataherald/types.py diff --git a/dataherald/utils/__init__.py b/services/engine/dataherald/utils/__init__.py similarity index 100% rename from dataherald/utils/__init__.py rename to services/engine/dataherald/utils/__init__.py diff --git a/dataherald/utils/agent_prompts.py b/services/engine/dataherald/utils/agent_prompts.py similarity index 100% rename from dataherald/utils/agent_prompts.py rename to services/engine/dataherald/utils/agent_prompts.py diff --git a/dataherald/utils/encrypt.py b/services/engine/dataherald/utils/encrypt.py similarity index 100% rename from dataherald/utils/encrypt.py rename to services/engine/dataherald/utils/encrypt.py diff --git a/dataherald/utils/error_codes.py b/services/engine/dataherald/utils/error_codes.py similarity index 100% rename from dataherald/utils/error_codes.py rename to services/engine/dataherald/utils/error_codes.py diff --git a/dataherald/utils/models_context_window.py b/services/engine/dataherald/utils/models_context_window.py similarity index 100% rename from dataherald/utils/models_context_window.py rename to services/engine/dataherald/utils/models_context_window.py diff --git a/dataherald/utils/s3.py b/services/engine/dataherald/utils/s3.py similarity index 100% rename from dataherald/utils/s3.py rename to services/engine/dataherald/utils/s3.py diff --git a/dataherald/utils/sql_utils.py b/services/engine/dataherald/utils/sql_utils.py similarity index 100% rename from dataherald/utils/sql_utils.py rename to services/engine/dataherald/utils/sql_utils.py diff --git a/dataherald/utils/strings.py b/services/engine/dataherald/utils/strings.py similarity index 100% rename from dataherald/utils/strings.py rename to services/engine/dataherald/utils/strings.py diff --git a/dataherald/utils/timeout_utils.py b/services/engine/dataherald/utils/timeout_utils.py similarity index 100% rename from dataherald/utils/timeout_utils.py rename to services/engine/dataherald/utils/timeout_utils.py diff --git a/dataherald/vector_store/__init__.py b/services/engine/dataherald/vector_store/__init__.py similarity index 100% rename from dataherald/vector_store/__init__.py rename to services/engine/dataherald/vector_store/__init__.py diff --git a/dataherald/vector_store/astra.py b/services/engine/dataherald/vector_store/astra.py similarity index 100% rename from dataherald/vector_store/astra.py rename to services/engine/dataherald/vector_store/astra.py diff --git a/dataherald/vector_store/chroma.py b/services/engine/dataherald/vector_store/chroma.py similarity index 100% rename from dataherald/vector_store/chroma.py rename to services/engine/dataherald/vector_store/chroma.py diff --git a/dataherald/vector_store/pinecone.py b/services/engine/dataherald/vector_store/pinecone.py similarity index 100% rename from dataherald/vector_store/pinecone.py rename to services/engine/dataherald/vector_store/pinecone.py diff --git a/docker-compose.yml b/services/engine/docker-compose.yml similarity index 100% rename from docker-compose.yml rename to services/engine/docker-compose.yml diff --git a/docs/Makefile b/services/engine/docs/Makefile similarity index 100% rename from docs/Makefile rename to services/engine/docs/Makefile diff --git a/docs/_static/placeholder.txt b/services/engine/docs/_static/placeholder.txt similarity index 100% rename from docs/_static/placeholder.txt rename to services/engine/docs/_static/placeholder.txt diff --git a/docs/api.add_instructions.rst b/services/engine/docs/api.add_instructions.rst similarity index 100% rename from docs/api.add_instructions.rst rename to services/engine/docs/api.add_instructions.rst diff --git a/docs/api.cancel_finetuning.rst b/services/engine/docs/api.cancel_finetuning.rst similarity index 100% rename from docs/api.cancel_finetuning.rst rename to services/engine/docs/api.cancel_finetuning.rst diff --git a/docs/api.create_database_connection.rst b/services/engine/docs/api.create_database_connection.rst similarity index 100% rename from docs/api.create_database_connection.rst rename to services/engine/docs/api.create_database_connection.rst diff --git a/docs/api.create_nl_generation.rst b/services/engine/docs/api.create_nl_generation.rst similarity index 100% rename from docs/api.create_nl_generation.rst rename to services/engine/docs/api.create_nl_generation.rst diff --git a/docs/api.create_prompt.rst b/services/engine/docs/api.create_prompt.rst similarity index 100% rename from docs/api.create_prompt.rst rename to services/engine/docs/api.create_prompt.rst diff --git a/docs/api.create_prompt_sql_generation.rst b/services/engine/docs/api.create_prompt_sql_generation.rst similarity index 100% rename from docs/api.create_prompt_sql_generation.rst rename to services/engine/docs/api.create_prompt_sql_generation.rst diff --git a/docs/api.create_prompt_sql_generation_nl_generation.rst b/services/engine/docs/api.create_prompt_sql_generation_nl_generation.rst similarity index 100% rename from docs/api.create_prompt_sql_generation_nl_generation.rst rename to services/engine/docs/api.create_prompt_sql_generation_nl_generation.rst diff --git a/docs/api.create_sql_generation.rst b/services/engine/docs/api.create_sql_generation.rst similarity index 100% rename from docs/api.create_sql_generation.rst rename to services/engine/docs/api.create_sql_generation.rst diff --git a/docs/api.create_sql_generation_nl_generation.rst b/services/engine/docs/api.create_sql_generation_nl_generation.rst similarity index 100% rename from docs/api.create_sql_generation_nl_generation.rst rename to services/engine/docs/api.create_sql_generation_nl_generation.rst diff --git a/docs/api.delete_finetuning.rst b/services/engine/docs/api.delete_finetuning.rst similarity index 100% rename from docs/api.delete_finetuning.rst rename to services/engine/docs/api.delete_finetuning.rst diff --git a/docs/api.delete_instructions.rst b/services/engine/docs/api.delete_instructions.rst similarity index 100% rename from docs/api.delete_instructions.rst rename to services/engine/docs/api.delete_instructions.rst diff --git a/docs/api.error_codes.rst b/services/engine/docs/api.error_codes.rst similarity index 100% rename from docs/api.error_codes.rst rename to services/engine/docs/api.error_codes.rst diff --git a/docs/api.execute_sql_generation.rst b/services/engine/docs/api.execute_sql_generation.rst similarity index 100% rename from docs/api.execute_sql_generation.rst rename to services/engine/docs/api.execute_sql_generation.rst diff --git a/docs/api.finetuning.rst b/services/engine/docs/api.finetuning.rst similarity index 100% rename from docs/api.finetuning.rst rename to services/engine/docs/api.finetuning.rst diff --git a/docs/api.get_csv_file.rst b/services/engine/docs/api.get_csv_file.rst similarity index 100% rename from docs/api.get_csv_file.rst rename to services/engine/docs/api.get_csv_file.rst diff --git a/docs/api.get_finetuning.rst b/services/engine/docs/api.get_finetuning.rst similarity index 100% rename from docs/api.get_finetuning.rst rename to services/engine/docs/api.get_finetuning.rst diff --git a/docs/api.get_nl_generation.rst b/services/engine/docs/api.get_nl_generation.rst similarity index 100% rename from docs/api.get_nl_generation.rst rename to services/engine/docs/api.get_nl_generation.rst diff --git a/docs/api.get_prompt.rst b/services/engine/docs/api.get_prompt.rst similarity index 100% rename from docs/api.get_prompt.rst rename to services/engine/docs/api.get_prompt.rst diff --git a/docs/api.get_sql_generation.rst b/services/engine/docs/api.get_sql_generation.rst similarity index 100% rename from docs/api.get_sql_generation.rst rename to services/engine/docs/api.get_sql_generation.rst diff --git a/docs/api.get_table_description.rst b/services/engine/docs/api.get_table_description.rst similarity index 100% rename from docs/api.get_table_description.rst rename to services/engine/docs/api.get_table_description.rst diff --git a/docs/api.golden_sql.rst b/services/engine/docs/api.golden_sql.rst similarity index 100% rename from docs/api.golden_sql.rst rename to services/engine/docs/api.golden_sql.rst diff --git a/docs/api.list_database_connections.rst b/services/engine/docs/api.list_database_connections.rst similarity index 100% rename from docs/api.list_database_connections.rst rename to services/engine/docs/api.list_database_connections.rst diff --git a/docs/api.list_finetunings.rst b/services/engine/docs/api.list_finetunings.rst similarity index 100% rename from docs/api.list_finetunings.rst rename to services/engine/docs/api.list_finetunings.rst diff --git a/docs/api.list_instructions.rst b/services/engine/docs/api.list_instructions.rst similarity index 100% rename from docs/api.list_instructions.rst rename to services/engine/docs/api.list_instructions.rst diff --git a/docs/api.list_nl_generations.rst b/services/engine/docs/api.list_nl_generations.rst similarity index 100% rename from docs/api.list_nl_generations.rst rename to services/engine/docs/api.list_nl_generations.rst diff --git a/docs/api.list_prompts.rst b/services/engine/docs/api.list_prompts.rst similarity index 100% rename from docs/api.list_prompts.rst rename to services/engine/docs/api.list_prompts.rst diff --git a/docs/api.list_query_history.rst b/services/engine/docs/api.list_query_history.rst similarity index 100% rename from docs/api.list_query_history.rst rename to services/engine/docs/api.list_query_history.rst diff --git a/docs/api.list_sql_generations.rst b/services/engine/docs/api.list_sql_generations.rst similarity index 100% rename from docs/api.list_sql_generations.rst rename to services/engine/docs/api.list_sql_generations.rst diff --git a/docs/api.list_table_description.rst b/services/engine/docs/api.list_table_description.rst similarity index 100% rename from docs/api.list_table_description.rst rename to services/engine/docs/api.list_table_description.rst diff --git a/docs/api.refresh_table_description.rst b/services/engine/docs/api.refresh_table_description.rst similarity index 100% rename from docs/api.refresh_table_description.rst rename to services/engine/docs/api.refresh_table_description.rst diff --git a/docs/api.rst b/services/engine/docs/api.rst similarity index 100% rename from docs/api.rst rename to services/engine/docs/api.rst diff --git a/docs/api.scan_table_description.rst b/services/engine/docs/api.scan_table_description.rst similarity index 100% rename from docs/api.scan_table_description.rst rename to services/engine/docs/api.scan_table_description.rst diff --git a/docs/api.update_database_connection.rst b/services/engine/docs/api.update_database_connection.rst similarity index 100% rename from docs/api.update_database_connection.rst rename to services/engine/docs/api.update_database_connection.rst diff --git a/docs/api.update_instructions.rst b/services/engine/docs/api.update_instructions.rst similarity index 100% rename from docs/api.update_instructions.rst rename to services/engine/docs/api.update_instructions.rst diff --git a/docs/api.update_table_descriptions.rst b/services/engine/docs/api.update_table_descriptions.rst similarity index 100% rename from docs/api.update_table_descriptions.rst rename to services/engine/docs/api.update_table_descriptions.rst diff --git a/docs/api_server.rst b/services/engine/docs/api_server.rst similarity index 100% rename from docs/api_server.rst rename to services/engine/docs/api_server.rst diff --git a/docs/architecture.png b/services/engine/docs/architecture.png similarity index 100% rename from docs/architecture.png rename to services/engine/docs/architecture.png diff --git a/docs/conf.py b/services/engine/docs/conf.py similarity index 100% rename from docs/conf.py rename to services/engine/docs/conf.py diff --git a/docs/context_store.rst b/services/engine/docs/context_store.rst similarity index 100% rename from docs/context_store.rst rename to services/engine/docs/context_store.rst diff --git a/docs/contributing.projects.rst b/services/engine/docs/contributing.projects.rst similarity index 100% rename from docs/contributing.projects.rst rename to services/engine/docs/contributing.projects.rst diff --git a/docs/db.rst b/services/engine/docs/db.rst similarity index 100% rename from docs/db.rst rename to services/engine/docs/db.rst diff --git a/docs/envars.rst b/services/engine/docs/envars.rst similarity index 100% rename from docs/envars.rst rename to services/engine/docs/envars.rst diff --git a/docs/evaluator.rst b/services/engine/docs/evaluator.rst similarity index 100% rename from docs/evaluator.rst rename to services/engine/docs/evaluator.rst diff --git a/docs/finetuning.rst b/services/engine/docs/finetuning.rst similarity index 100% rename from docs/finetuning.rst rename to services/engine/docs/finetuning.rst diff --git a/docs/index.rst b/services/engine/docs/index.rst similarity index 100% rename from docs/index.rst rename to services/engine/docs/index.rst diff --git a/docs/introduction.rst b/services/engine/docs/introduction.rst similarity index 100% rename from docs/introduction.rst rename to services/engine/docs/introduction.rst diff --git a/docs/make.bat b/services/engine/docs/make.bat similarity index 100% rename from docs/make.bat rename to services/engine/docs/make.bat diff --git a/docs/modules.rst b/services/engine/docs/modules.rst similarity index 100% rename from docs/modules.rst rename to services/engine/docs/modules.rst diff --git a/docs/quickstart.rst b/services/engine/docs/quickstart.rst similarity index 100% rename from docs/quickstart.rst rename to services/engine/docs/quickstart.rst diff --git a/docs/requirements.txt b/services/engine/docs/requirements.txt similarity index 100% rename from docs/requirements.txt rename to services/engine/docs/requirements.txt diff --git a/docs/text_to_sql_engine.rst b/services/engine/docs/text_to_sql_engine.rst similarity index 100% rename from docs/text_to_sql_engine.rst rename to services/engine/docs/text_to_sql_engine.rst diff --git a/docs/tutorial.chatgpt_plugin.rst b/services/engine/docs/tutorial.chatgpt_plugin.rst similarity index 100% rename from docs/tutorial.chatgpt_plugin.rst rename to services/engine/docs/tutorial.chatgpt_plugin.rst diff --git a/docs/tutorial.finetune_sql_generator.rst b/services/engine/docs/tutorial.finetune_sql_generator.rst similarity index 100% rename from docs/tutorial.finetune_sql_generator.rst rename to services/engine/docs/tutorial.finetune_sql_generator.rst diff --git a/docs/tutorial.run_scripts.rst b/services/engine/docs/tutorial.run_scripts.rst similarity index 100% rename from docs/tutorial.run_scripts.rst rename to services/engine/docs/tutorial.run_scripts.rst diff --git a/docs/tutorial.sample_database.rst b/services/engine/docs/tutorial.sample_database.rst similarity index 100% rename from docs/tutorial.sample_database.rst rename to services/engine/docs/tutorial.sample_database.rst diff --git a/docs/tutorial.streamlit_app.rst b/services/engine/docs/tutorial.streamlit_app.rst similarity index 100% rename from docs/tutorial.streamlit_app.rst rename to services/engine/docs/tutorial.streamlit_app.rst diff --git a/docs/vector_store.rst b/services/engine/docs/vector_store.rst similarity index 100% rename from docs/vector_store.rst rename to services/engine/docs/vector_store.rst diff --git a/initdb.d/init-mongo.sh b/services/engine/initdb.d/init-mongo.sh similarity index 100% rename from initdb.d/init-mongo.sh rename to services/engine/initdb.d/init-mongo.sh diff --git a/log_config.yml b/services/engine/log_config.yml similarity index 100% rename from log_config.yml rename to services/engine/log_config.yml diff --git a/pyproject.toml b/services/engine/pyproject.toml similarity index 100% rename from pyproject.toml rename to services/engine/pyproject.toml diff --git a/requirements.txt b/services/engine/requirements.txt similarity index 100% rename from requirements.txt rename to services/engine/requirements.txt diff --git a/setup.py b/services/engine/setup.py similarity index 100% rename from setup.py rename to services/engine/setup.py diff --git a/tmp/__init__.py b/services/engine/tmp/__init__.py similarity index 100% rename from tmp/__init__.py rename to services/engine/tmp/__init__.py