Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/Agent_Hub_Dev' into Agent_Hub_Dev
Browse files Browse the repository at this point in the history
  • Loading branch information
yhjun1026 committed Oct 18, 2023
2 parents e350b12 + 96ae500 commit b06dbe3
Show file tree
Hide file tree
Showing 28 changed files with 2,041 additions and 418 deletions.
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,11 @@ Run on an RTX 4090 GPU.

![demo_en](https://github.com/eosphoros-ai/DB-GPT/assets/17919400/d40118e4-8e76-45b6-b4a6-30e5ff170f42)


![7f1bd042-7165-4b9f-a88e-ccce35f9d9aa](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/0f6e0944-24e5-4481-87f5-7c168a63c5ea)



#### Chat with data, and figure charts.

![db plugins demonstration](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/4113ac15-83c2-4350-86c0-5fc795677abd)
Expand Down Expand Up @@ -225,6 +230,13 @@ The core capabilities mainly consist of the following parts:
- [ ] Elasticsearch
- [ ] ClickHouse
- [ ] Faiss

- [ ] Testing and Evaluation Capability Building
- [ ] Knowledge QA datasets
- [ ] Question collection [easy, medium, hard]:
- [ ] Scoring mechanism
- [ ] Testing and evaluation using Excel + DB datasets

### Multi Datasource Support

- Multi Datasource Support
Expand All @@ -251,11 +263,19 @@ The core capabilities mainly consist of the following parts:
- [x] [Cluster Deployment](https://db-gpt.readthedocs.io/en/latest/getting_started/install/cluster/vms/index.html)
- [x] [Fastchat Support](https://github.com/lm-sys/FastChat)
- [x] [vLLM Support](https://db-gpt.readthedocs.io/en/latest/getting_started/install/llm/vllm/vllm.html)
- [ ] Cloud-native environment and support for Ray environment
- [ ] Service Registry(eg:nacos)
- [ ] Compatibility with OpenAI's interfaces
- [ ] Expansion and optimization of embedding models

### Agents market and Plugins
- [x] multi-agents framework
- [x] custom plugin development
- [ ] plugin market
- [ ] Integration with CoT
- [ ] Enrich plugin sample library
- [ ] Support for AutoGPT protocol
- [ ] Integration of multi-agents and visualization capabilities, defining LLM+Vis new standards

### Cost and Observability
- [x] [debugging](https://db-gpt.readthedocs.io/en/latest/getting_started/observability.html)
Expand Down
17 changes: 17 additions & 0 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ DB-GPT 是一个开源的以数据库为基础的GPT实验项目,使用本地

![demo_zh](https://github.com/eosphoros-ai/DB-GPT/assets/17919400/94a40a1b-fb54-4a3b-b0a6-30575bd2796c)

![7f1bd042-7165-4b9f-a88e-ccce35f9d9aa](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/0f6e0944-24e5-4481-87f5-7c168a63c5ea)

#### 根据自然语言对话生成分析图表

![db plugins demonstration](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/4113ac15-83c2-4350-86c0-5fc795677abd)
Expand Down Expand Up @@ -298,11 +300,26 @@ The MIT License (MIT)
- [x] [fastchat支持](https://github.com/lm-sys/FastChat)
- [x] [fastchat支持](https://github.com/lm-sys/FastChat)
- [x] [vLLM 支持](https://db-gpt.readthedocs.io/en/latest/getting_started/install/llm/vllm/vllm.html)
- [ ] 云原生环境与Ray环境支持
- [ ] 注册中心引入nacos
- [ ] 上层接口兼容Openai
- [ ] Embedding模型扩充,优化

### Agents与插件市场
- [x] 多Agents框架
- [x] 自定义Agents
- [ ] 插件市场
- [ ] CoT集成
- [ ] 丰富插件样本库
- [ ] 支持AutoGPT协议
- [ ] Multi-agents & 可视化能力打通,定义LLM+Vis新标准


### 测试评估能力建设
- [ ] 知识库的数据文本集
- [ ] 问题集合 [easy、medium、hard]
- [ ] 评分机制
- [ ] Excel + DB库表的测试评估

### 成本与可观测性
- [x] [debugging](https://db-gpt.readthedocs.io/en/latest/getting_started/observability.html)
Expand Down
Binary file modified docs/_static/img/muti-model-cluster-overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
copyright = "2023, csunny"
author = "csunny"

version = "👏👏 0.3.9"
version = "👏👏 0.4.0"
html_title = project + " " + version

# -- General configuration ---------------------------------------------------
Expand Down
3 changes: 2 additions & 1 deletion docs/getting_started/application.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,5 @@ DB-GPT product is a Web application that you can chat database, chat knowledge,
./application/chatdb/chatdb.md
./application/kbqa/kbqa.md
./application/dashboard/dashboard.md
./application/chatexcel/chatexcel.md
./application/chatexcel/chatexcel.md
./application/model/model.md
61 changes: 61 additions & 0 deletions docs/getting_started/application/model/model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
Model Management
==================================
![model](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/4b160ee7-2e2a-4502-bd54-d7daa14b23e5)
DB-GPT Product Provides LLM Model Management in web interface.Including LLM Create, Start and Stop.
Now DB-GPT support LLMs:
```{admonition} Support LLMs
* Multi LLMs Support, Supports multiple large language models, currently supporting
* [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
* [baichuan2-7b/baichuan2-13b](https://huggingface.co/baichuan-inc)
* [internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)
* [Qwen/Qwen-7B-Chat/Qwen-14B-Chat](https://huggingface.co/Qwen/)
* [Vicuna](https://huggingface.co/Tribbiani/vicuna-13b)
* [BlinkDL/RWKV-4-Raven](https://huggingface.co/BlinkDL/rwkv-4-raven)
* [camel-ai/CAMEL-13B-Combined-Data](https://huggingface.co/camel-ai/CAMEL-13B-Combined-Data)
* [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b)
* [FreedomIntelligence/phoenix-inst-chat-7b](https://huggingface.co/FreedomIntelligence/phoenix-inst-chat-7b)
* [h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b](https://huggingface.co/h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b)
* [lcw99/polyglot-ko-12.8b-chang-instruct-chat](https://huggingface.co/lcw99/polyglot-ko-12.8b-chang-instruct-chat)
* [lmsys/fastchat-t5-3b-v1.0](https://huggingface.co/lmsys/fastchat-t5)
* [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat)
* [Neutralzz/BiLLa-7B-SFT](https://huggingface.co/Neutralzz/BiLLa-7B-SFT)
* [nomic-ai/gpt4all-13b-snoozy](https://huggingface.co/nomic-ai/gpt4all-13b-snoozy)
* [NousResearch/Nous-Hermes-13b](https://huggingface.co/NousResearch/Nous-Hermes-13b)
* [openaccess-ai-collective/manticore-13b-chat-pyg](https://huggingface.co/openaccess-ai-collective/manticore-13b-chat-pyg)
* [OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5](https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5)
* [project-baize/baize-v2-7b](https://huggingface.co/project-baize/baize-v2-7b)
* [Salesforce/codet5p-6b](https://huggingface.co/Salesforce/codet5p-6b)
* [StabilityAI/stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b)
* [THUDM/chatglm-6b](https://huggingface.co/THUDM/chatglm-6b)
* [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)
* [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)
* [timdettmers/guanaco-33b-merged](https://huggingface.co/timdettmers/guanaco-33b-merged)
* [togethercomputer/RedPajama-INCITE-7B-Chat](https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat)
* [WizardLM/WizardLM-13B-V1.0](https://huggingface.co/WizardLM/WizardLM-13B-V1.0)
* [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0)
* [baichuan-inc/baichuan-7B](https://huggingface.co/baichuan-inc/baichuan-7B)
* [HuggingFaceH4/starchat-beta](https://huggingface.co/HuggingFaceH4/starchat-beta)
* [FlagAlpha/Llama2-Chinese-13b-Chat](https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat)
* [BAAI/AquilaChat-7B](https://huggingface.co/BAAI/AquilaChat-7B)
* [all models of OpenOrca](https://huggingface.co/Open-Orca)
* [Spicyboros](https://huggingface.co/jondurbin/spicyboros-7b-2.2?not-for-all-audiences=true) + [airoboros 2.2](https://huggingface.co/jondurbin/airoboros-l2-13b-2.2)
* [VMware's OpenLLaMa OpenInstruct](https://huggingface.co/VMware/open-llama-7b-open-instruct)
* Support API Proxy LLMs
* [ChatGPT](https://api.openai.com/)
* [Tongyi](https://www.aliyun.com/product/dashscope)
* [Wenxin](https://cloud.baidu.com/product/wenxinworkshop?track=dingbutonglan)
* [ChatGLM](http://open.bigmodel.cn/)
```
### Create && Start LLM Model
```{note}
Make sure your LLM Model file is downloaded or LLM Model Proxy api service is ready.
```
![model-start](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/dacabcb9-92c6-43eb-95ed-8cabaa2d18e6)
When create success, you can see:
![image](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/1b69bff6-8b37-493d-b6be-38f7b6e8ae2d)
Then you can choose and switch llm model service to chat.
![image](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/2d20eb6b-8976-4731-b433-373ac3383602)
### Stop LLM Model
![image](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/a21278d9-7bef-487b-bef1-460ce516b2f5)

5 changes: 5 additions & 0 deletions docs/getting_started/faq/deploy/deploy_faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,4 +92,9 @@ pip install chromadb==0.4.10

```commandline
pip install langchain>=0.0.286
##### Q9: In Centos OS, No matching distribution found for setuptools_scm
```commandline
pip install --use-pep517 fschat
```
64 changes: 62 additions & 2 deletions docs/getting_started/install/cluster/cluster.rst
Original file line number Diff line number Diff line change
@@ -1,19 +1,79 @@
Cluster deployment
LLM Deployment
==================================
In the exploration and implementation of AI model applications, it can be challenging to directly integrate with model services. Currently, there is no established standard for deploying large models, and new models and inference methods are constantly being released. As a result, a significant amount of time is spent adapting to the ever-changing underlying model environment. This, to some extent, hinders the exploration and implementation of AI model applications.

We divide the deployment of large models into two layers: the model inference layer and the model deployment layer. The model inference layer corresponds to model inference frameworks such as vLLM, TGI, and TensorRT. The model deployment layer interfaces with the inference layer below and provides model serving capabilities above. We refer to this layer's framework as the model deployment framework. Positioned above the inference frameworks, the model deployment framework offers capabilities such as multiple model instances, multiple inference frameworks, multiple service protocols, multi-cloud support, automatic scaling, and observability.

In order to deploy DB-GPT to multiple nodes, you can deploy a cluster. The cluster architecture diagram is as follows:

.. raw:: html

<img src="../../../_static/img/muti-model-cluster-overview.png" />

Design of DB-GPT:
-----------------

DB-GPT is designed as a llm deployment framework, taking into account the above design objectives.

- Support for llm and inference frameworks: DB-GPT supports the simultaneous deployment of llm and is compatible with multiple inference frameworks such as vLLM, TGI, and TensorRT.

- Scalability and stability: DB-GPT has good scalability, allowing easy addition of new models and inference frameworks. It utilizes a distributed architecture and automatic scaling capabilities to handle high concurrency and large-scale requests, ensuring system stability.

- Performance optimization: DB-GPT undergoes performance optimization to provide fast and efficient model inference capabilities, preventing it from becoming a performance bottleneck during inference.

- Management and observability capabilities: DB-GPT offers management and monitoring functionalities, including model deployment and configuration management, performance monitoring, and logging. It can generate reports on model performance and service status to promptly identify and resolve issues.

- Lightweight: DB-GPT is designed as a lightweight framework to improve deployment efficiency and save resources. It employs efficient algorithms and optimization strategies to minimize resource consumption while maintaining sufficient functionality and performance.

1.Support for multiple models and inference frameworks
-----------------
The field of large models is evolving rapidly, with new models being released and new methods being proposed for model training and inference. We believe that this situation will continue for some time.

For most users exploring and implementing AI applications, this situation has its pros and cons. The benefits are apparent, as it brings new opportunities and advancements. However, one drawback is that users may feel compelled to constantly try and explore new models and inference frameworks.

In DB-GPT, seamless support is provided for FastChat, vLLM, and llama.cpp. In theory, any model supported by these frameworks is also supported by DB-GPT. If you have requirements for faster inference speed and concurrency, you can directly use vLLM. If you want good inference performance on CPU or Apple's M1/M2 chips, you can use llama.cpp. Additionally, DB-GPT also supports various proxy models from OpenAI, Azure OpenAI, Google BARD, Wenxin Yiyan, Tongyi Qianwen, and Zhipu AI, among others.

2.Have good scalability and stability
-----------------
A comprehensive model deployment framework consists of several components: the Model Worker, which directly interfaces with the underlying inference frameworks; the Model Controller, which manages and maintains multiple model components; and the Model API, which provides external model serving capabilities.

The Model Worker plays a crucial role and needs to be highly extensible. It can be specialized for deploying large language models, embedding models, or other types of models. The choice of Model Worker depends on the deployment environment, such as a regular physical server environment, a Kubernetes environment, or specific cloud environments provided by various cloud service providers.

Having different Model Worker options allows users to select the most suitable one based on their specific requirements and infrastructure. This flexibility enables efficient deployment and utilization of models across different environments.

The Model Controller, responsible for managing model metadata, also needs to be scalable. Different deployment environments and model management requirements may call for different choices of Model Controllers.

Furthermore, I believe that model serving shares many similarities with traditional microservices. In microservices, a service can have multiple instances, and all instances are registered in a central registry. Service consumers retrieve the list of instances based on the service name from the registry and select a specific instance for invocation using a load balancing strategy.

Similarly, in model deployment, a model can have multiple instances, and all instances can be registered in a model registry. Model service consumers retrieve the list of instances based on the model name from the registry and select a specific instance for invocation using a model-specific load balancing strategy.

Introducing a model registry, responsible for storing model instance metadata, enables such an architecture. The model registry can leverage existing service registries used in microservices (such as Nacos, Eureka, etcd, Consul, etc.) as implementations. This ensures high availability of the entire deployment system.

3.High performance for framework.
------------------
and optimization are complex tasks, and inappropriate framework designs can increase this complexity. In our view, to ensure that the deployment framework does not lag behind in terms of performance, there are two main areas of focus:

Avoid excessive encapsulation: The more encapsulation and longer the chain, the more challenging it becomes to identify performance issues.

High-performance communication design: High-performance communication involves various aspects that cannot be elaborated in detail here. However, considering that Python occupies a prominent position in current AIGC applications, asynchronous interfaces are crucial for service performance in Python. Therefore, the model serving layer should only provide asynchronous interfaces and be compatible with the layers that interface with the model inference framework. If the model inference framework offers asynchronous interfaces, direct integration should be implemented. Otherwise, synchronous-to-asynchronous task conversion should be used to provide support.

4.Management and monitoring capabilities.
------------------
In the exploration or production implementation of AIGC (Artificial Intelligence and General Computing) applications, it is important for the model deployment system to have certain management capabilities. This involves controlling the deployed model instances through APIs or command-line interfaces, such as for online/offline management, restarting, and debugging.

Observability is a crucial capability in production systems, and I believe it is equally, if not more, important in AIGC applications. This is because user experiences and interactions with the system are more complex. In addition to traditional observability metrics, we are also interested in user input information and corresponding contextual information, which specific model instance and parameters were invoked, the content and response time of model outputs, user feedback, and more.

By analyzing this information, we can identify performance bottlenecks in model services and gather user experience data (e.g., response latency, problem resolution, and user satisfaction extracted from user content). These insights serve as important foundations for further optimizing the entire application.

* On :ref:`Deploying on standalone mode <standalone-index>`. Standalone Deployment.
* On :ref:`Deploying on cluster mode <local-cluster-index>`. Cluster Deployment.

* On :ref:`Deploying on local machine <local-cluster-index>`. Local cluster deployment.

.. toctree::
:maxdepth: 2
:caption: Cluster deployment
:name: cluster_deploy
:hidden:

./vms/standalone.md
./vms/index.md
2 changes: 1 addition & 1 deletion docs/getting_started/install/cluster/vms/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Local cluster deployment
Cluster Deployment
==================================
(local-cluster-index)=
## Model cluster deployment
Expand Down
19 changes: 15 additions & 4 deletions docs/getting_started/install/deploy/deploy.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,19 @@ This tutorial gives you a quick walkthrough about use DB-GPT with you environmen

To get started, install DB-GPT with the following steps.

### 1. Hardware Requirements
As our project has the ability to achieve ChatGPT performance of over 85%, there are certain hardware requirements. However, overall, the project can be deployed and used on consumer-grade graphics cards. The specific hardware requirements for deployment are as follows:
### 1. Hardware Requirements
DB-GPT can be deployed on servers with low hardware requirements or on servers with high hardware requirements.

##### Low hardware requirements
The low hardware requirements mode is suitable for integrating with third-party LLM services' APIs, such as OpenAI, Tongyi, Wenxin, or Llama.cpp.

DB-GPT provides set proxy api to support LLM api.

As our project has the ability to achieve ChatGPT performance of over 85%,

##### High hardware requirements
The high hardware requirements mode is suitable for independently deploying LLM services, such as Llama series models, Baichuan, ChatGLM, Vicuna, and other private LLM service.
there are certain hardware requirements. However, overall, the project can be deployed and used on consumer-grade graphics cards. The specific hardware requirements for deployment are as follows:

| GPU | VRAM Size | Performance |
|----------|-----------| ------------------------------------------- |
Expand All @@ -16,7 +27,7 @@ As our project has the ability to achieve ChatGPT performance of over 85%, there
| V100 | 16 GB | Conversation inference possible, noticeable stutter |
| T4 | 16 GB | Conversation inference possible, noticeable stutter |

if your VRAM Size is not enough, DB-GPT supported 8-bit quantization and 4-bit quantization.
If your VRAM Size is not enough, DB-GPT supported 8-bit quantization and 4-bit quantization.

Here are some of the VRAM size usage of the models we tested in some common scenarios.

Expand Down Expand Up @@ -64,7 +75,7 @@ Notice make sure you have install git-lfs
centos:yum install git-lfs
ubuntu:app-get install git-lfs
ubuntu:apt-get install git-lfs
macos:brew install git-lfs
```
Expand Down
Loading

0 comments on commit b06dbe3

Please sign in to comment.