The goal of the benchmark Miniwob++ is to train machine learning models (agents) to do things in a browser that can be specified in natural language. It contains a collection of over 100 web interaction environments along with JavaScript and Python interfaces for programmatically interacting with them. It use The Gymnasium interface allows an agent to initialize and interact with a MiniWoB++ environment as follows:
import gymnasium
env = gymnasium.make('miniwob/click-test-2-v1', render_mode='human')
try:
observation, info = env.reset(seed=42)
for _ in range(1000):
action = policy(observation) # User-defined policy function
observation, reward, terminated, truncated, info = env.step(action)
if terminated:
observation, info = env.reset()
finally:
env.close()
We evaluate Autogen’s capabilities for making online decisions on the benchmark minibwob++. We used two agents for our implementation. One is the origin AssistantAgent of Autogen without any modification. This agent is responsible for proposing/modifying the plan to complete the given task and make decisions at each step based on the interaction environment. The other is MiniwobUserProxyAgent, which is responsible for interacting with the minibwob++ benchmark, executing the actions sent by the AssistantAgent, and returning the results to the AssistantAgent.
- Install packages
conda install --yes --file requirements.txt
cd computergym
pip install -e .
- Set up openai key in main.py and config.json.
python main.py --problem click-button-sequence
Available problems are in available_tasks.txt.
We make a comparison with the state of the art method RCI[1].
- Including all clicking tasks from easy to hard
- Remove examples that will exceed tokens limit when agent interactions
[1] Kim, G., Baldi, P., & McAleer, S. (2023). Language models can solve computer tasks. arXiv preprint arXiv:2303.17491.