Agent notebook example with human feedback; Support shell command and…

… multiple code blocks; Improve the system message for assistant agent; Improve utility functions for config lists; reuse docker image (#1056) * add agent notebook and documentation * fix bug * set flush to True when printing msg in agent * add a math problem in agent notebook * remove * header * improve notebook doc * notebook update * improve notebook example * improve doc * agent notebook example with user feedback * log * log * improve notebook doc * improve print * doc * human_input_mode * human_input_mode str * indent * indent * Update flaml/autogen/agent/user_proxy_agent.py Co-authored-by: Chi Wang <wang.chi@microsoft.com> * shell command and multiple code blocks * Update notebook/autogen_agent.ipynb Co-authored-by: Chi Wang <wang.chi@microsoft.com> * Update notebook/autogen_agent.ipynb Co-authored-by: Chi Wang <wang.chi@microsoft.com> * Update notebook/autogen_agent.ipynb Co-authored-by: Chi Wang <wang.chi@microsoft.com> * coding agent * math notebook * renaming and doc format * typo * infer lang * sh * docker * docker * reset consecutive autoreply counter * fix explanation * paper talk * human feedback * web info * rename test * config list explanation * link to blogpost * installation * homepage features * features * features * rename agent * remove notebook * notebook test * docker command * notebook update * lang -> cmd * notebook * make it work for gpt-3.5 * return full log * quote * docker * docker * docker * docker * docker * docker image list * notebook * notebook * use_docker * use_docker * use_docker * doc * agent * doc * abs path * pandas * docker * reuse docker image * context window * news * print format * pyspark version in py3.8 * pyspark in py3.8 * pyspark and ray * quote * pyspark * pyspark * pyspark --------- Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
microsoft · Jun 9, 2023 · 5387a0a · 5387a0a
1 parent d36b2af
commit 5387a0a
Show file tree

Hide file tree

Showing 28 changed files with 2,199 additions and 1,283 deletions.
diff --git a/.github/workflows/python-package.yml b/.github/workflows/python-package.yml
@@ -49,22 +49,21 @@ jobs:
  export CFLAGS="$CFLAGS -I/usr/local/opt/libomp/include"
  export CXXFLAGS="$CXXFLAGS -I/usr/local/opt/libomp/include"
  export LDFLAGS="$LDFLAGS -Wl,-rpath,/usr/local/opt/libomp/lib -L/usr/local/opt/libomp/lib -lomp"
- - name: On Linux + python 3.8, install pyspark 3.2.3
- if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.8'
- run: |
- python -m pip install --upgrade pip wheel
- pip install pyspark==3.2.3
  - name: Install packages and dependencies
  run: |
  python -m pip install --upgrade pip wheel
  pip install -e .
  python -c "import flaml"
  pip install -e .[test]
+ - name: On Ubuntu python 3.8, install pyspark 3.2.3
+ if: matrix.python-version == '3.8' && matrix.os == 'ubuntu-latest'
+ run: |
+ pip install pyspark==3.2.3
  pip list | grep "pyspark"
  - name: If linux, install ray 2
  if: matrix.os == 'ubuntu-latest'
  run: |
- pip install ray[tune]
+ pip install "ray[tune]<2.5.0"
  - name: If mac, install ray
  if: matrix.os == 'macOS-latest'
  run: |
@@ -77,8 +76,8 @@ jobs:
  if: matrix.python-version != '3.10'
  run: |
  pip install -e .[vw]
- - name: Uninstall pyspark on python 3.9
- if: matrix.python-version == '3.9'
+ - name: Uninstall pyspark on python 3.8 or 3.9 for windows
+ if: matrix.python-version == '3.8' || matrix.python-version == '3.9' && matrix.os == 'windows-2019'
  run: |
  # Uninstall pyspark to test env without pyspark
  pip uninstall -y pyspark

diff --git a/README.md b/README.md
@@ -14,18 +14,19 @@
  <br>
 </p>
 
-:fire: v1.2.0 is released with support for ChatGPT and GPT-4.
+:fire: FLAML is highlighted in OpenAI's [cookbook](https://github.com/openai/openai-cookbook#related-resources-from-around-the-web)
+:fire: [autogen](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) is released with support for ChatGPT and GPT-4, based on [Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference](https://arxiv.org/abs/2303.04673).
+:fire: FLAML supports AutoML and Hyperparameter Tuning features in [Microsoft Fabric](https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview) private preview. Sign up for these features at: https://aka.ms/fabric/data-science/sign-up.
 
 
 ## What is FLAML
 FLAML is a lightweight Python library for efficient automation of machine
 learning and AI operations, including selection of
 models, hyperparameters, and other tunable choices of an application (e.g., inference hyperparameters for foundation models, configurations in MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations).
 
-* For foundation models like the GPT series and AI agents based on them, it automates the experimentation and optimization of their performance to maximize the effectiveness for applications and minimize the inference cost.
-* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources.
-* It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training/inference/evaluation code).
-* It supports fast automatic tuning, capable of handling complex constraints/guidance/early stopping. FLAML is powered by a [cost-effective
+* For foundation models like the GPT models, it automates the experimentation and optimization of their performance to maximize the effectiveness for applications and minimize the inference cost. FLAML enables users to build and use adaptive AI agents with minimal effort.
+* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., search space and metric), or full customization (arbitrary training/inference/evaluation code).
+* It supports fast and economical automatic tuning, capable of handling complex constraints/guidance/early stopping. FLAML is powered by a [cost-effective
 hyperparameter optimization](https://microsoft.github.io/FLAML/docs/Use-Cases/Tune-User-Defined-Function/#hyperparameter-optimization-algorithm)
 and model selection method invented by Microsoft Research, and many followup [research studies](https://microsoft.github.io/FLAML/docs/Research).
 
@@ -42,13 +43,14 @@ FLAML requires **Python version >= 3.7**. It can be installed from pip:
 pip install flaml
 ```
 
-To run the [`notebook examples`](https://github.com/microsoft/FLAML/tree/main/notebook),
-install flaml with the [notebook] option:
-
+Minimal dependencies are installed without extra options. You can install extra options based on the feature you need. For example, use the following to install the dependencies needed by the [`autogen`](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) package.
 ```bash
-pip install flaml[notebook]
+pip install "flaml[autogen]"
 ```
 
+Find more options in [Installation](Installation).
+Each of the [`notebook examples`](https://github.com/microsoft/FLAML/tree/main/notebook) may require a specific option to be installed.
+
 ### .NET
 
 Use the following guides to get started with FLAML in .NET:
@@ -59,25 +61,31 @@ Use the following guides to get started with FLAML in .NET:
 
 ## Quickstart
 
-* (New) You can optimize [generations](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) by ChatGPT or GPT-4 etc. with your own tuning data, success metrics and budgets.
-
-```python
-from flaml import oai
-
-config, analysis = oai.Completion.tune(
- data=tune_data,
- metric="success",
- mode="max",
- eval_func=eval_func,
- inference_budget=0.05,
- optimization_budget=3,
- num_samples=-1,
-)
-```
-
-The automated experimentation and optimization can help you maximize the utility out of these expensive models.
-A suite of utilities are offered to accelerate the experimentation and application development, such as low-level inference API with caching, templating, filtering, and higher-level components like LLM-based coding and interactive agents.
-
+* (New) The [autogen](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) package can help you maximize the utility out of the expensive LLMs such as ChatGPT and GPT-4, including:
+ - A drop-in replacement of `openai.Completion` or `openai.ChatCompletion` with powerful functionalites like tuning, caching, templating, filtering. For example, you can optimize generations by LLM with your own tuning data, success metrics and budgets.
+ ```python
+ from flaml import oai
+
+ # perform tuning
+ config, analysis = oai.Completion.tune(
+ data=tune_data,
+ metric="success",
+ mode="max",
+ eval_func=eval_func,
+ inference_budget=0.05,
+ optimization_budget=3,
+ num_samples=-1,
+ )
+
+ # perform inference for a test instance
+ response = oai.Completion.create(context=test_instance, **config)
+ ```
+ - LLM-driven intelligent agents which can perform tasks autonomously or with human feedback, including tasks that require using tools via code.
+ ```python
+ assistant = AssistantAgent("assistant")
+ user = UserProxyAgent("user", human_input_mode="TERMINATE")
+ assistant.receive("Draw a rocket and save to a file named 'rocket.svg'")
+ ```
 * With three lines of code, you can start using this economical and fast
 AutoML engine as a [scikit-learn style estimator](https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML).
 

diff --git a/flaml/autogen/agent/__init__.py b/flaml/autogen/agent/__init__.py
@@ -0,0 +1,5 @@
+from .agent import Agent
+from .assistant_agent import AssistantAgent
+from .user_proxy_agent import UserProxyAgent
+
+__all__ = ["Agent", "AssistantAgent", "UserProxyAgent"]
diff --git a/flaml/autogen/agent/agent.py b/flaml/autogen/agent/agent.py
@@ -37,7 +37,8 @@ def _send(self, message, recipient):
 
  def _receive(self, message, sender):
  """Receive a message from another agent."""
- print("\n****", self.name, "received message from", sender.name, "****\n", flush=True)
+ print("\n", "-" * 80, "\n", flush=True)
+ print(sender.name, "(to", f"{self.name}):", flush=True)
  print(message, flush=True)
  self._conversations[sender.name].append({"content": message, "role": "user"})
 

diff --git a/flaml/autogen/agent/assistant_agent.py b/flaml/autogen/agent/assistant_agent.py
@@ -0,0 +1,47 @@
+from .agent import Agent
+from flaml.autogen.code_utils import DEFAULT_MODEL
+from flaml import oai
+
+
+class AssistantAgent(Agent):
+ """(Experimental) Assistant agent, able to suggest code blocks."""
+
+ DEFAULT_SYSTEM_MESSAGE = """You are a helpful AI assistant.
+ In the following cases, suggest python code (in a python coding block) or shell script (in a sh coding block) for the user to execute. You must indicate the script type in the code block.
+ 1. When you need to ask the user for some info, use the code to output the info you need, for example, browse or search the web, download/read a file.
+ 2. When you need to perform some task with code, use the code to perform the task and output the result. Finish the task smartly. Solve the task step by step if you need to.
+ If you want the user to save the code in a file before executing it, put # filename: <filename> inside the code block as the first line. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user.
+ If the result indicates there is an error, fix the error and output the code again. Suggeset the full code instead of partial code or code changes.
+ Reply "TERMINATE" in the end when everything is done.
+ """
+
+ DEFAULT_CONFIG = {
+ "model": DEFAULT_MODEL,
+ }
+
+ def __init__(self, name, system_message=DEFAULT_SYSTEM_MESSAGE, **config):
+ """
+ Args:
+ name (str): agent name.
+ system_message (str): system message to be sent to the agent.
+ **config (dict): other configurations allowed in
+ [oai.Completion.create](../oai/Completion#create).
+ These configurations will be used when invoking LLM.
+ """
+ super().__init__(name, system_message)
+ self._config = self.DEFAULT_CONFIG.copy()
+ self._config.update(config)
+ self._sender_dict = {}
+
+ def receive(self, message, sender):
+ if sender.name not in self._sender_dict:
+ self._sender_dict[sender.name] = sender
+ self._conversations[sender.name] = [{"content": self._system_message, "role": "system"}]
+ super().receive(message, sender)
+ responses = oai.ChatCompletion.create(messages=self._conversations[sender.name], **self._config)
+ response = oai.ChatCompletion.extract_text(responses)[0]
+ self._send(response, sender)
+
+ def reset(self):
+ self._sender_dict.clear()
+ self._conversations.clear()
diff --git a/flaml/autogen/agent/coding_agent.py b/flaml/autogen/agent/coding_agent.py
diff --git a/flaml/autogen/agent/user_proxy_agent.py b/flaml/autogen/agent/user_proxy_agent.py
@@ -1,5 +1,5 @@
 from .agent import Agent
-from flaml.autogen.code_utils import extract_code, execute_code
+from flaml.autogen.code_utils import UNKNOWN, extract_code, execute_code, infer_lang
 from collections import defaultdict
 
 
@@ -54,36 +54,51 @@ def __init__(
  self._consecutive_auto_reply_counter = defaultdict(int)
  self._use_docker = use_docker
 
- def _execute_code(self, code, lang):
+ def _execute_code(self, code_blocks):
  """Execute the code and return the result."""
- if lang in ["bash", "shell"]:
- if not code.startswith("python "):
- return 1, f"please do not suggest bash or shell commands like {code}"
- file_name = code[len("python ") :]
- exitcode, logs = execute_code(filename=file_name, work_dir=self._work_dir, use_docker=self._use_docker)
- logs = logs.decode("utf-8")
- elif lang == "python":
- if code.startswith("# filename: "):
- filename = code[11 : code.find("\n")].strip()
+ logs_all = ""
+ for code_block in code_blocks:
+ lang, code = code_block
+ if not lang:
+ lang = infer_lang(code)
+ if lang in ["bash", "shell", "sh"]:
+ # if code.startswith("python "):
+ # # return 1, f"please do not suggest bash or shell commands like {code}"
+ # file_name = code[len("python ") :]
+ # exitcode, logs = execute_code(filename=file_name, work_dir=self._work_dir, use_docker=self._use_docker)
+ # else:
+ exitcode, logs, image = execute_code(
+ code, work_dir=self._work_dir, use_docker=self._use_docker, lang=lang
+ )
+ logs = logs.decode("utf-8")
+ elif lang == "python":
+ if code.startswith("# filename: "):
+ filename = code[11 : code.find("\n")].strip()
+ else:
+ filename = None
+ exitcode, logs, image = execute_code(
+ code, work_dir=self._work_dir, filename=filename, use_docker=self._use_docker
+ )
+ logs = logs.decode("utf-8")
  else:
- filename = None
- exitcode, logs = execute_code(code, work_dir=self._work_dir, filename=filename, use_docker=self._use_docker)
- logs = logs.decode("utf-8")
- else:
- # TODO: could this happen?
- exitcode, logs = 1, f"unknown language {lang}"
- # raise NotImplementedError
- return exitcode, logs
+ # TODO: could this happen?
+  exitcode, logs, image = 1, f"unknown language {lang}"
+  # raise NotImplementedError
+  self._use_docker = image
+ logs_all += "\n" + logs
+ if exitcode != 0:
+  return exitcode, logs_all
+ return exitcode, logs_all
 
  def auto_reply(self, message, sender, default_reply=""):
  """Generate an auto reply."""
- code, lang = extract_code(message)
- if lang == "unknown":
- # no code block is found, lang should be "unknown"
+ code_blocks = extract_code(message)
+ if len(code_blocks) == 1 and code_blocks[0][0] == UNKNOWN:
+ # no code block is found, lang should be `UNKNOWN``
  self._send(default_reply, sender)
  else:
  # try to execute the code
- exitcode, logs = self._execute_code(code, lang)
+ exitcode, logs = self._execute_code(code_blocks)
  exitcode2str = "execution succeeded" if exitcode == 0 else "execution failed"
  self._send(f"exitcode: {exitcode} ({exitcode2str})\nCode output: {logs}", sender)
 
@@ -111,8 +126,10 @@ def receive(self, message, sender):
  # this corresponds to the case when self._human_input_mode == "NEVER"
  reply = "exit"
  if reply == "exit" or (self._is_termination_msg(message) and not reply):
+ # reset the consecutive_auto_reply_counter
+ self._consecutive_auto_reply_counter[sender.name] = 0
  return
- elif reply:
+ if reply:
  # reset the consecutive_auto_reply_counter
  self._consecutive_auto_reply_counter[sender.name] = 0
  self._send(reply, sender)