Skip to content

Commit

Permalink
DH-4598 ported changes in other branch for docs into the refactor branch
Browse files Browse the repository at this point in the history
  • Loading branch information
aazo11 committed Sep 14, 2023
1 parent 156b4a9 commit a5700b9
Show file tree
Hide file tree
Showing 13 changed files with 221 additions and 33 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ Dataherald is a natural language-to-SQL engine built for enterprise-level questi

This project is undergoing swift development, and as such, the API may be subject to change at any time.

If you would like to learn more, you can join the <a href="https://discord.gg/A59Uxyy2k9" target="_blank">Discord</a> or <a href="https://dataherald.readthedocs.io/" target="_blank">read the docs</a>.

## Overview

### Background
Expand Down
2 changes: 1 addition & 1 deletion docs/api.process_nl_query_response.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Process a NL query response
=======================
=============================

Once you made a question you can try sending a new sql query to improve the response, this is not stored

Expand Down
94 changes: 94 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,100 @@
API
=======================

The Dataherald Engine exposes RESTful APIs that can be used to:

* 🔌 Connect to and manage connections to databases
* 🔑 Add context to the engine through scanning the databases, adding descriptions to tables and columns and adding golden records
* 🙋‍♀️ Ask natural language questions from the relational data

Our APIs have resource-oriented URL built around standard HTTP response codes and verbs. The core resources are described below.


Database Connections
------------------------------

The ``database-connections`` object allows you to define connections to your relational data stores.

Related endpoints are:

* :doc:`Create database connection <api.create_database_connection>` -- ``POST api/v1/database-connections``
* :doc:`List database connections <api.list_database_connections>` -- ``GET api/v1/database-connections``
* :doc:`Update a database connection <api.update_database_connection>` -- ``PUT api/v1/database-connections/{alias}``


.. code-block:: json
{
"alias": "string",
"use_ssh": false,
"connection_uri": "string",
"path_to_credentials_file": "string",
"ssh_settings": {
"db_name": "string",
"host": "string",
"username": "string",
"password": "string",
"remote_host": "string",
"remote_db_name": "string",
"remote_db_password": "string",
"private_key_path": "string",
"private_key_password": "string",
"db_driver": "string"
}
}
Query Response
------------------
The ``query-response`` object is created from the answering natural language questions from the relational data.

The related endpoints are:

* :doc:`process_nl_query_response <api.process_nl_query_response>` -- ``POST api/v1/nl-query-responses``
* :doc:`update_nl_query_response <api.update_nl_query_response>` -- ``PATCH api/v1/nl-query-responses/{query_id}``


.. code-block:: json
{
"confidence_score": "string",
"error_message": "string",
"exec_time": "float",
"intermediate_steps":["string"],
"nl_question_id": "string",
"nl_response": "string",
"sql_generation_status": "string",
"sql_query": "string",
"sql_query_result": {},
"total_cost": "float",
"total_tokens": "int"
}
Table Descriptions
---------------------
The ``table-descriptions`` object is used to add context about the tables and columns in the relational database.
These are then used to help the LLM build valid SQL to answer natural language questions.

Related endpoints are:

* :doc:`Scan table description <api.scan_table_description>` -- ``POST api/v1/table-descriptions/scan``
* :doc:`Add table description <api.add_descriptions>` -- ``PATCH api/v1/table-descriptions/{table_description_id}``
* :doc:`List table description <api.list_table_description>` -- ``GET api/v1/table-descriptions``

.. code-block:: json
{
"columns": [{}],
"db_connection_id": "string",
"description": "string",
"examples": [{}],
"table_name": "string",
"table_schema": "string"
}
.. toctree::
:hidden:

Expand Down
4 changes: 2 additions & 2 deletions docs/api.update_nl_query_response.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Update a NL query response
=======================
============================

Once you made a question, you can give feedback to improve the queries
Once you ask a question, you can give feedback to improve the queries

Request this ``PATCH`` endpoint::

Expand Down
5 changes: 3 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,11 @@

sys.path.insert(0, os.path.abspath(".."))

project = "Dataherald"
project = "Dataherald AI"
copyright = "2023, Dataherald"
author = "Dataherald"
release = "0.0.1"
release = "main"
html_title = project

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
Expand Down
11 changes: 11 additions & 0 deletions docs/contributing.projects.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Jumping in
====================

We are beyond thrilled that you are considering joining this project. There are a number of
community projects that are in development, spanning areas such as:

* Connecting to public data sources
* Building integrations with front-end frameworks
* Testing and benchmarking new NL-to-SQL approaches proposed in academic literature

The best place to jump in is to hop on the #projects channel on our :ref:`Discord server <https://discord.gg/A59Uxyy2k9>`_
63 changes: 63 additions & 0 deletions docs/envars.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
Environment Variables
=======================
The Dataherald engine has a number of environment variables that need to be set in order for it to work. The following is the sample
provided in the .env.example file with the default values.


.. code-block:: bash
OPENAI_API_KEY =
ORG_ID =
LLM_MODEL = 'gpt-4-32k'
GOLDEN_RECORD_COLLECTION = 'my-golden-records'
PINECONE_API_KEY =
PINECONE_ENVIRONMENT =
API_SERVER = "dataherald.api.fastapi.FastAPI"
SQL_GENERATOR = "dataherald.sql_generator.dataherald_sqlagent.DataheraldSQLAgent"
EVALUATOR = "dataherald.eval.simple_evaluator.SimpleEvaluator"
DB = "dataherald.db.mongo.MongoDB"
VECTOR_STORE = 'dataherald.vector_store.chroma.Chroma'
CONTEXT_STORE = 'dataherald.context_store.default.DefaultContextStore'
DB_SCANNER = 'dataherald.db_scanner.sqlalchemy.SqlAlchemyScanner'
MONGODB_URI = "mongodb://admin:admin@mongodb:27017"
MONGODB_DB_NAME = 'dataherald'
MONGODB_DB_USERNAME = 'admin'
MONGODB_DB_PASSWORD = 'admin'
ENCRYPT_KEY =
S3_AWS_ACCESS_KEY_ID =
S3_AWS_SECRET_ACCESS_KEY =
`
.. csv-table::
:header: "Variable Name", "Description", "Default Value", "Required"
:widths: 15, 55, 25, 5
"OPENAI_API_KEY", "The OpenAI key used by the Dataherald Engine", "None", "Yes"
"ORG_ID", "The OpenAI Organization ID used by the Dataherald Engine", "None", "Yes"
"LLM_MODEL", "The Language Model used by the Dataherald Engine. Supported values include gpt-4-32k, gpt-4, gpt-3.5-turbo, gpt-3.5-turbo-16k", "``gpt-4-32k``", "No"
"GOLDEN_RECORD_COLLECTION", "The name of the collection in Mongo where golden records will be stored", "``my-golden-records``", "No"
"PINECONE_API_KEY", "The Pinecone API key used", "None", "Yes if using the Pinecone vector store"
"PINECONE_ENVIRONMENT", "The Pinecone environment", "None", "Yes if using the Pinecone vector store"
"API_SERVER", "The implementation of the API Module used by the Dataherald Engine.", "``dataherald.api.fastapi.FastAPI``", "Yes"
"SQL_GENERATOR", "The implementation of the SQLGenerator Module to be used.", "``dataherald.sql_generator. dataherald_sqlagent. DataheraldSQLAgent``", "Yes"
"EVALUATOR", "The implementation of the Evaluator Module to be used.", "``dataherald.eval. simple_evaluator.SimpleEvaluator``", "Yes"
"DB", "The implementation of the DB Module to be used.", "``dataherald.db.mongo.MongoDB``", "Yes"
"VECTOR_STORE", "The implementation of the Vector Store Module to be used. Chroma and Pinecone modules are currently included.", "``dataherald.vector_store. chroma.Chroma``", "Yes"
"CONTEXT_STORE", "The implementation of the Context Store Module to be used.", "``dataherald.context_store. default.DefaultContextStore``", "Yes"
"DB_SCANNER", "The implementation of the DB Scanner Module to be used.", "``dataherald.db_scanner. sqlalchemy.SqlAlchemyScanner``", "Yes"
"MONGODB_URI", "The URI of the MongoDB that will be used for application storage.", "``mongodb:// admin:admin@mongodb:27017``", "Yes"
"MONGODB_DB_NAME", "The name of the MongoDB database that will be used.", "``dataherald``", "Yes"
"MONGODB_DB_USERNAME", "The username of the MongoDB database", "``admin``", "Yes"
"MONGODB_DB_PASSWORD", "The password of the MongoDB database", "``admin``", "Yes"
"ENCRYPT_KEY", "The key that will be used to encrypt data at rest before storing", "None", "Yes"
"S3_AWS_ACCESS_KEY_ID", "The key used to access credential files if saved to S3", "None", "No"
"S3_AWS_SECRET_ACCESS_KEY", "The key used to access credential files if saved to S3", "None", "No "
19 changes: 0 additions & 19 deletions docs/getting_started.rst

This file was deleted.

32 changes: 28 additions & 4 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,38 @@ Dataherald AI
========================================
Welcome to the official documentation page of the Dataherald AI engine. This documentation is intended for developers who want to:

* Use the Dataherald AI engine to set up Natural Language interfaces from structured data in their own projects.
* Contribute to the Dataherald AI engine.
* 🖥️ Use the Dataherald AI engine to set up Natural Language interfaces from structured data in their own projects.
* 🏍️ Contribute to the Dataherald AI engine.

These documents will cover how to get started, how to set up an API from your database that can answer questions in plain English and how to extend the core engine's functionality.

.. toctree::
:maxdepth: 1
:caption: Getting Started
:hidden:

introduction
quickstart

.. toctree::
:caption: References
:hidden:

getting_started
api
modules
envars
modules

.. toctree::
:caption: Tutorials
:hidden:

tutorial.sample_database
tutorial.finetune_sql_generator
tutorial.chatgpt_plugin


.. toctree::
:caption: Contributing
:hidden:

contributing.projects
10 changes: 5 additions & 5 deletions docs/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ You can use Dataherald to:

Dataherald is built to:

* Be modular, allowing different implementations of core modules to be plugged-in
* Come batteries included: Have best-in-class implementations for modules like text to SQL, evaluation
* Be easy to set-up and use with major data warehouses
* Allow for Active Learning, allowing you to improve the performance with usage
* Be fast
* 🔌 Be modular, allowing different implementations of core modules to be plugged-in
* 🔋 Come batteries included: Have best-in-class implementations for modules like text to SQL, evaluation
* 📀 Be easy to set-up and use with major data warehouses
* 👨‍🏫 Allow for Active Learning, allowing you to improve the performance with usage
* 🏎️ Be fast
4 changes: 4 additions & 0 deletions docs/tutorial.chatgpt_plugin.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Create a ChatGPT plug-in from your structured data
=====================================================

Coming soon ...
4 changes: 4 additions & 0 deletions docs/tutorial.finetune_sql_generator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Using a Custom Text to SQL Engine
==================================

Coming soon ...
4 changes: 4 additions & 0 deletions docs/tutorial.sample_database.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Setting up a sample Database for accurate NL-to-SQL
====================================================

Coming soon ...

0 comments on commit a5700b9

Please sign in to comment.