Skip to content

Commit

Permalink
refactor: reorganize script location
Browse files Browse the repository at this point in the history
  • Loading branch information
chenweize1998 committed Oct 10, 2023
1 parent ccf4319 commit 6bdce1d
Show file tree
Hide file tree
Showing 20 changed files with 22 additions and 21 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,4 +38,4 @@ jobs:
run: |
python setup.py develop
python agentverse_command/benchmark.py --task tasksolving/mgsm/gpt-3.5 --dataset_path data/mgsm/test_sample.jsonl --overwrite --output_path ci_smoke_test_output --tasks_dir ./agentverse/tasks
python evaluate_math.py --path ci_smoke_test_output/results.jsonl --ci_smoke_test
python scripts/evaluate_math.py --path ci_smoke_test_output/results.jsonl --ci_smoke_test
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -172,4 +172,5 @@ raw/
results
tmp/
data/toolbench
logs/
logs/
ci_smoke_test_output/
Original file line number Diff line number Diff line change
Expand Up @@ -29,16 +29,18 @@ def step(
*args,
**kwargs,
) -> Any:
from evaluate_commongen import scoring
from scripts.evaluate_commongen import scoring

coverage, missing_tokens = scoring([s.content for s in solution], [task_description])
coverage, missing_tokens = scoring(
[s.content for s in solution], [task_description]
)
if len(missing_tokens[0]) == 0:
missing_tokens = "No missing tokens."
else:
missing_tokens = ", ".join(missing_tokens[0])
result = f"Coverage: {coverage*100:.2f}%\nMissing Tokens: {missing_tokens}"
return [ExecutorMessage(content=result)]

async def astep(
self,
agent: ExecutorAgent,
Expand All @@ -47,9 +49,11 @@ async def astep(
*args,
**kwargs,
) -> Any:
from evaluate_commongen import scoring
from scripts.evaluate_commongen import scoring

coverage, missing_tokens = scoring([s.content for s in solution], [task_description])
coverage, missing_tokens = scoring(
[s.content for s in solution], [task_description]
)
if len(missing_tokens[0]) == 0:
missing_tokens = "No missing tokens."
else:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ cnt_agents: &cnt_agents 3
cnt_tool_agents: &cnt_tool_agents 2
max_rounds: &max_rounds 5
max_criticizing_rounds: 3
tool_config: &tool_config tools_simplified.json
tool_config: &tool_config agentverse/tasks/tasksolving/tool_using/tools_simplified.json

task_description: Recently, it has become popular in the AI field to verify the mathematical reasoning abilities of large language models by observing if they can solve the "24-Point Game." What is this game? Does it have a code-based solution? If it does, provide a Python code along with test cases and test its functionality. What are some other similar games that can be used to test the models' mathematical reasoning abilities?

Expand Down
2 changes: 1 addition & 1 deletion agentverse/tasks/tasksolving/tool_using/bmi/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ cnt_agents: &cnt_agents 3
cnt_tool_agents: &cnt_tool_agents 2
max_rounds: &max_rounds 5
max_criticizing_rounds: 3
tool_config: &tool_config tools_simplified.json
tool_config: &tool_config agentverse/tasks/tasksolving/tool_using/tools_simplified.json

task_description: I want to lose 5kg in the next 2 months. I weigh 70kg, am 170cm tall, and my age is 25. Calculate my BMI and based on that, suggest a workout routine and daily calorie intake to help me achieve my goal.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ cnt_agents: &cnt_agents 3
cnt_tool_agents: &cnt_tool_agents 2
max_rounds: &max_rounds 5
max_criticizing_rounds: 3
tool_config: &tool_config tools_simplified.json
tool_config: &tool_config agentverse/tasks/tasksolving/tool_using/tools_simplified.json

task_description: I want to kick off a book club with my friends. Can you tell me the top 5 bestselling books this month, gather the content summary for each, and find online platforms where we can buy or borrow them?

Expand Down
2 changes: 1 addition & 1 deletion agentverse/tasks/tasksolving/tool_using/car/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ cnt_agents: &cnt_agents 4
cnt_tool_agents: &cnt_tool_agents 3
max_rounds: &max_rounds 5
max_criticizing_rounds: 3
tool_config: &tool_config tools_simplified.json
tool_config: &tool_config agentverse/tasks/tasksolving/tool_using/tools_simplified.json

task_description: I am planning to buy a new car. Could you help me compare the features and prices of the latest models of Tesla, Ford, and Toyota? Include details about range, charging time, safety features, and after-sales service. Also, provide a brief analysis of the pros and cons of each car.

Expand Down
2 changes: 1 addition & 1 deletion agentverse/tasks/tasksolving/tool_using/date/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ cnt_agents: &cnt_agents 4
cnt_tool_agents: &cnt_tool_agents 3
max_rounds: &max_rounds 5
max_criticizing_rounds: 3
tool_config: &tool_config tools_simplified.json
tool_config: &tool_config agentverse/tasks/tasksolving/tool_using/tools_simplified.json

task_description: I am planning a date with my girlfriend this week, please search for a good movie theater and a restaurant near Tsinghua University in Beijing and recommend a good movie to watch. Please search the web.

Expand Down
2 changes: 1 addition & 1 deletion agentverse/tasks/tasksolving/tool_using/diy/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ cnt_agents: &cnt_agents 4
cnt_tool_agents: &cnt_tool_agents 3
max_rounds: &max_rounds 5
max_criticizing_rounds: 3
tool_config: &tool_config tools_simplified.json
tool_config: &tool_config agentverse/tasks/tasksolving/tool_using/tools_simplified.json

task_description: I've recently taken an interest in DIY home projects. Search for beginner-friendly DIY projects that can be completed over the weekend. Also, provide a list of materials required and a step-by-step guide for each project.

Expand Down
2 changes: 1 addition & 1 deletion agentverse/tasks/tasksolving/tool_using/party/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ cnt_agents: &cnt_agents 4
cnt_tool_agents: &cnt_tool_agents 3
max_rounds: &max_rounds 5
max_criticizing_rounds: 3
tool_config: &tool_config tools_simplified.json
tool_config: &tool_config agentverse/tasks/tasksolving/tool_using/tools_simplified.json

task_description: I want to hold a party at somewhere around Tsinghua University tomorrow. I need you to look for some best places for holding a party nearby, and tell me whether the weather is good for holding a party tomorrow. Also, I want to know what activities can be considered in my party. Help me search the web.

Expand Down
2 changes: 1 addition & 1 deletion agentverse/tasks/tasksolving/tool_using/sudoku/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ cnt_agents: &cnt_agents 3
cnt_tool_agents: &cnt_tool_agents 2
max_rounds: &max_rounds 5
max_criticizing_rounds: 3
tool_config: &tool_config tools_simplified.json
tool_config: &tool_config agentverse/tasks/tasksolving/tool_using/tools_simplified.json

task_description: I've just heard an interesting game called 'sudoku'. Can you search for the rules of this game and the solution to this game? Finally, write a python script to automatically solve this game if possible.

Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ cnt_agents: &cnt_agents 4
cnt_tool_agents: &cnt_tool_agents 3
max_rounds: &max_rounds 5
max_criticizing_rounds: 3
tool_config: &tool_config tools_simplified.json
tool_config: &tool_config agentverse/tasks/tasksolving/tool_using/tools_simplified.json

task_description: I'm currently analyzing what is popular on the website. Can you help me find the recent trending stuff. It could be anything, like trending news, products, books, movies, music, etc. Give a summarization for me.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ cnt_agents: &cnt_agents 4
cnt_tool_agents: &cnt_tool_agents 3
max_rounds: &max_rounds 5
max_criticizing_rounds: 3
tool_config: &tool_config tools_simplified.json
tool_config: &tool_config agentverse/tasks/tasksolving/tool_using/tools_simplified.json

task_description: I'm planning a two-week vacation to Japan next month. Help me plan my itinerary. I want to visit Tokyo, Kyoto, and Osaka. Look for the top tourist attractions in each city, and also suggest the best mode of travel between these cities. Additionally, find out the weather forecast for the month I'll be visiting.

Expand Down
Empty file added scripts/__init__.py
Empty file.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
4 changes: 0 additions & 4 deletions test_pokemon_env.py

This file was deleted.

0 comments on commit 6bdce1d

Please sign in to comment.