Skip to content

Commit

Permalink
Merge pull request #220 from SylphAI-Inc/main
Browse files Browse the repository at this point in the history
[V0.2.3] AdalComponent API improvement + RAG pipeline
  • Loading branch information
liyin2015 authored Sep 20, 2024
2 parents 9b2df1e + 092400b commit 363f535
Show file tree
Hide file tree
Showing 43 changed files with 8,674 additions and 241 deletions.
32 changes: 17 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ For AI researchers, product teams, and software engineers who want to learn the
# Why AdalFlow

1. Embracing a design pattern similar to PyTorch, AdalFlow is powerful, light, modular, and robust.
AdalFlow provides `Model-agnostic` building blocks to build LLM task pipeline, ranging from RAG, Agents to classical NLP tasks like text classification and named entity recognition. It is easy to get high performance even with just basic manual promting.
AdalFlow provides `Model-agnostic` building blocks to build LLM task pipelines, ranging from RAG, Agents to classical NLP tasks like text classification and named entity recognition. It is easy to get high performance only using manual prompting.
2. AdalFlow provides a unified auto-differentiative framework for both zero-shot prompt optimization and few-shot optimization. It advances existing auto-optimization research, including ``Text-Grad`` and ``DsPy``.
Through our research, ``Text-Grad 2.0`` and ``Learn-to-Reason Few-shot In Context Learning``, AdalFlow ``Trainer`` achieves the highest accuracy while being the most token-efficient.

Expand All @@ -86,7 +86,7 @@ Through our research, ``Text-Grad 2.0`` and ``Learn-to-Reason Few-shot In Contex
<!-- AdalFlow not only helps developers build model-agnostic LLM task pipelines with full control over prompts and output processing, but it also auto-optimizes these pipelines to achieve SOTA accuracy. -->
<!-- Embracing a design pattern similar to PyTorch, AdalFlow is powerful, light, modular, and robust. -->

Here is our optimization demonstration on a text classification task:
Here is an optimization demonstration on a text classification task:
<!-- <p align="center">
<img src="docs/source/_static/images/classification_training_map.png" alt="AdalFlow Auto-optimization" style="width: 80%;">
</p>
Expand All @@ -104,16 +104,16 @@ Here is our optimization demonstration on a text classification task:
</p>


Among all libraries, we achieved the highest accuracy with manual prompting (starting at 82%) and the highest accuracy after optimization.
Among all libraries, AdalFlow achieved the highest accuracy with manual prompting (starting at 82%) and the highest accuracy after optimization.

Further reading: [Optimize Classification](https://adalflow.sylph.ai/use_cases/classification.html)

## Light, Modular, and Model-agnositc Task Pipeline
## Light, Modular, and Model-Agnostic Task Pipeline

LLMs are like water; AdalFlow help developers quickly shape them into any applications, from GenAI applications such as chatbots, translation, summarization, code generation, RAG, and autonomous agents to classical NLP tasks like text classification and named entity recognition.
LLMs are like water; AdalFlow help you quickly shape them into any applications, from GenAI applications such as chatbots, translation, summarization, code generation, RAG, and autonomous agents to classical NLP tasks like text classification and named entity recognition.

Only two fundamental but powerful base classes: `Component` for the pipeline and `DataClass` for data interaction with LLMs.
The result is a library with bare minimum abstraction, providing developers with *maximum customizability*.
AdalFlow has two fundamental, but powerful, base classes: `Component` for the pipeline and `DataClass` for data interaction with LLMs.
The result is a library with minimal abstraction, providing developers with *maximum customizability*.

You have full control over the prompt template, the model you use, and the output parsing for your task pipeline.

Expand Down Expand Up @@ -162,17 +162,17 @@ class Net(nn.Module):
## Unified Framework for Auto-Optimization

AdalFlow provides token-efficient and high-performing prompt optimization within a unified framework.
To optimize your pipeline, simply define a ``Parameter`` and pass it to our ``Generator``.
Whether you need to optimize task instructions or few-shot demonstrations,
our unified framework offers an easy way to **diagnose**, **visualize**, **debug**, and **train** your pipeline.
To optimize your pipeline, simply define a ``Parameter`` and pass it to AdalFlow's ``Generator``.
Whether you need to optimize task instructions or run some few-shot demonstrations,
AdalFlow's unified framework offers an easy way to **diagnose**, **visualize**, **debug**, and **train** your pipeline.

This [Dynamic Computation Graph](https://adalflow.sylph.ai/tutorials/trace_graph.html) demonstrates how our auto-differentiation and the dynamic computation graph work.

No need to manually defined nodes and edges, AdalFlow will automatically trace the computation graph for you.

### **Trainable Task Pipeline**

Just define it as a ``Parameter`` and pass it to our ``Generator``.
Just define it as a ``Parameter`` and pass it to AdalFlow's ``Generator``.

<p align="center">
<img src="https://raw.githubusercontent.com/SylphAI-Inc/LightRAG/main/docs/source/_static/images/Trainable_task_pipeline.png" alt="AdalFlow Trainable Task Pipeline">
Expand Down Expand Up @@ -215,7 +215,9 @@ AdalFlow full documentation available at [adalflow.sylph.ai](https://adalflow.sy

# AdalFlow: A Tribute to Ada Lovelace

AdalFlow is named in honor of [Ada Lovelace](https://en.wikipedia.org/wiki/Ada_Lovelace), the pioneering female mathematician who first recognized that machines could do more than just calculations. As a team led by female founder, we aim to inspire more women to enter the AI field.

AdalFlow is named in honor of [Ada Lovelace](https://en.wikipedia.org/wiki/Ada_Lovelace), the pioneering female mathematician who first recognized that machines could do more than just perform calculations. As a female-led team, we aim to inspire more women to enter the AI field.


# Contributors

Expand All @@ -229,7 +231,7 @@ Many existing works greatly inspired AdalFlow library! Here is a non-exhaustive
- 📚 [Micrograd](https://github.com/karpathy/micrograd): A tiny autograd engine for our auto-differentiative architecture.
- 📚 [Text-Grad](https://github.com/zou-group/textgrad) for the ``Textual Gradient Descent`` text optimizer.
- 📚 [DSPy](https://github.com/stanfordnlp/dspy) for inspiring the ``__{input/output}__fields`` in our ``DataClass`` and the bootstrap few-shot optimizer.
- 📚 [OPRO](https://github.com/google-deepmind/opro) for adding past text instruction along with its accuracy in the text optimizer.
- 📚 [OPRO](https://github.com/google-deepmind/opro) for adding past text instructions along with its accuracy in the text optimizer.
- 📚 [PyTorch Lightning](https://github.com/Lightning-AI/pytorch-lightning) for the ``AdalComponent`` and ``Trainer``.

# Citation
Expand All @@ -245,8 +247,8 @@ Many existing works greatly inspired AdalFlow library! Here is a non-exhaustive
}
```

# Star History
<!-- # Star History
[![Star History Chart](https://api.star-history.com/svg?repos=SylphAI-Inc/AdalFlow&type=Date)](https://star-history.com/#SylphAI-Inc/AdalFlow&Date)
[![Star History Chart](https://api.star-history.com/svg?repos=SylphAI-Inc/AdalFlow&type=Date)](https://star-history.com/#SylphAI-Inc/AdalFlow&Date) -->
<!--
<a href="https://trendshift.io/repositories/11559" target="_blank"><img src="https://trendshift.io/api/badge/repositories/11559" alt="SylphAI-Inc%2FAdalFlow | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> -->
8 changes: 8 additions & 0 deletions adalflow/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
## [0.2.3] - 2024-09-20

### Rename
- three methods within `AdalComponent` are renamed:
- `handle_one_loss_sample` to `prepre_loss`
- `handle_one_task_sample` to `prepre_task`
- `evaluate_one_sample` to `prepre_eval`

## [0.2.3.beta.1] - 2024-09-17
### Removed
- Removed /reasoning as COT is just too simple to be a separate module.
Expand Down
2 changes: 1 addition & 1 deletion adalflow/adalflow/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "0.2.3.beta.1"
__version__ = "0.2.3"

from adalflow.core.component import Component, fun_to_component
from adalflow.core.container import Sequential
Expand Down
48 changes: 35 additions & 13 deletions adalflow/adalflow/datasets/big_bench_hard.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,14 @@ class BigBenchHard(Dataset):
- test: 100 examples
Args:
task_name (str): The name of the task. "BHH_{task_name}" is the task name in the dataset.
task_name (str): The name of the task. "{task_name}" is the task name in the dataset.
root (str, optional): Root directory of the dataset to save the data. Defaults to ~/.adalflow/cache_datasets/task_name.
split (str, optional): The dataset split, supports ``"train"`` (default), ``"val"`` and ``"test"``.
"""

def __init__(
self,
task_name: Literal["BBH_object_counting"] = "BBH_object_counting",
task_name: Literal["object_counting"] = "object_counting",
root: str = None,
split: Literal["train", "val", "test"] = "train",
*args,
Expand All @@ -51,7 +51,7 @@ def __init__(
self.root = root
self.split = split

self.task_name = "_".join(task_name.split("_")[1:])
self.task_name = task_name
data_path = prepare_dataset_path(self.root, self.task_name)
self._check_or_download_dataset(data_path, split)

Expand All @@ -74,15 +74,37 @@ def _check_or_download_dataset(self, data_path: str = None, split: str = "train"
return

print(f"Downloading dataset to {json_path}")

subprocess.call(
[
"wget",
f"https://raw.githubusercontent.com/suzgunmirac/BIG-Bench-Hard/main/bbh/{self.task_name}.json",
"-O",
json_path,
]
)
try:
# Use subprocess and capture the return code
result = subprocess.call(
[
"wget",
f"https://raw.githubusercontent.com/suzgunmirac/BIG-Bench-Hard/main/bbh/{self.task_name}.json",
"-O",
json_path,
]
)

# Check if wget failed (non-zero exit code)
if result != 0:
raise ValueError(
f"Failed to download dataset for task '{self.task_name}'.\n"
"Please verify the task name (the JSON file name) by checking the following link:\n"
"https://github.com/suzgunmirac/BIG-Bench-Hard/tree/main/bbh"
)

# Check if the file is non-empty
if not os.path.exists(json_path) or os.path.getsize(json_path) == 0:
raise ValueError(
f"Downloaded file is empty. Please check the task name '{self.task_name}' or network issues."
)

except Exception as e:
raise ValueError(
f"Either network issues or an incorrect task name: '{self.task_name}'.\n"
"Please verify the task name (the JSON file name) by checking the following link:\n"
"https://github.com/suzgunmirac/BIG-Bench-Hard/tree/main/bbh"
) from e

with open(json_path) as json_file:
data = json.load(json_file)
Expand Down Expand Up @@ -128,7 +150,7 @@ def get_default_task_instruction():
if __name__ == "__main__":
from adalflow.datasets.big_bench_hard import BigBenchHard

dataset = BigBenchHard("BBH_word_sorting", split="train")
dataset = BigBenchHard(task_name="word_sorting", split="train")
print(dataset[0:10])
print(len(dataset))
print(dataset.get_default_task_instruction())
4 changes: 3 additions & 1 deletion adalflow/adalflow/optim/few_shot/bootstrap_optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,9 @@ def __init__(

def add_scores(self, ids: List[str], scores: List[float], is_teacher: bool = True):
if len(ids) != len(scores):
raise ValueError("ids and scores must have the same length")
raise ValueError(
f"ids and scores must have the same length, got ids: {ids}, scores: {scores}"
)

for score in scores:

Expand Down
8 changes: 3 additions & 5 deletions adalflow/adalflow/optim/parameter.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ def __init__(
data: T = None, # for generator output, the data will be set up as raw_response
requires_opt: bool = True,
role_desc: str = "",
param_type: ParameterType = ParameterType.PROMPT,
param_type: ParameterType = ParameterType.NONE,
name: str = None, # name is used to refer to the parameter in the prompt, easier to read for humans
gradient_prompt: str = None,
raw_response: str = None, # use this to track the raw response of generator instead of the data (can be parsed)
Expand Down Expand Up @@ -495,7 +495,7 @@ def draw_graph(
# # Set up TensorBoard logging
# writer = SummaryWriter(log_dir)

filename = f"trace_graph_{self.name}"
filename = f"trace_graph_{self.name}_id_{self.id}"
filepath = (
os.path.join(filepath, filename)
if filepath
Expand Down Expand Up @@ -577,9 +577,7 @@ def wrap_and_escape(text, width=40):
log.info(f"Node: {n.name}, {n.to_dict()}")
# track gradients
for g in n.gradients:
# writer.add_text(g.name, str(g.to_dict()))
# if g.gradient_prompt:
# writer.add_text(f"{g.name}_prompt", g.gradient_prompt)

log.info(f"Gradient: {g.name}, {g.to_dict()}")
log.info(f"Gradient prompt: {g.gradient_prompt}")
for n1, n2 in edges:
Expand Down
1 change: 1 addition & 0 deletions adalflow/adalflow/optim/text_grad/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ def forward(self, params: List[Parameter]) -> Parameter:
requires_opt=any([p.requires_opt for p in params]),
name="sum",
score=sum([p._score for p in params]), # total has a score
param_type=ParameterType.SUM_OUTPUT,
)
total.set_predecessors(params)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,7 @@ def forward(
requires_opt=True,
role_desc=response_desc,
score=score,
param_type=ParameterType.LOSS_OUTPUT,
)
eval_param.set_predecessors(predesessors)

Expand Down
Loading

0 comments on commit 363f535

Please sign in to comment.