feat: In safe-mode, asked the user whether they want to continue or cancel before any possible changes are made #122

lifeizhou-ap · 2024-10-07T03:16:00Z

Why
At the moment, when the plan is kicked off, the tasks are executed automatically. Users do not have much control to review potential changes that the task will make. It is hard for user to make the decision whether they want to proceed, skip the task or cancel the plan except using CTRL-C

What
Enable a safe-mode (can be renamed) to provide some control for the user to decide whether to continue before running the tools of

shell
patch_file
write_file

Make the diff more readable

Shell
Before the shell script is executed, the user is shown with the explanation of the command. The ai will also determine whether the command will do any changes on the users' computer(via prompt). If it does, the user is will be prompted whether they want to continue to execute the command. If yes, command will be executed. If no, the task will be cancelled and all the subsequence task will be cancelled. (We can also provide skip option so that that specific task can be skipped, and polish the experiences)

patch_file and write_file
Before changing the file, user are show the diff of the changes in the similar way of git diff. The user will be prompted whether they want to change the file. If yes, file will be changed. If no, file won't be changed and will move to the next task. (We can add yes to all options later on, so goose can remember the user decision in the plan)

shell_command.mov

patch_file.mov

write_file.mov

Note
This PR is a draft PR and the changes is a bit unorganised at the moment. The main purpose is to get your feedback about whether the safe-mode is useful for our users. I can polish and refactor the code afterwards.

…e running shell command

codefromthecrypt · 2024-10-07T04:02:27Z

related: it would be nice for toolkits to have a means to declare the system packages they depend on. In this way, we could test for the binary, like gh, vs having goose attempt to auto-install it.

lifeizhou-ap · 2024-10-07T05:01:28Z

related: it would be nice for toolkits to have a means to declare the system packages they depend on. In this way, we could test for the binary, like gh, vs having goose attempt to auto-install it.

Hey @codefromthecrypt Thanks for the feedback.

Just to confirm whether I understand your comments correctly. Do you mean we can have something smart in toolkit that they can know the packages on my local machine. For example, if I have already have packages on my local machine to achieve the task, the plan won't install another tool to achieve the same task. Is my understanding correct?

codefromthecrypt · 2024-10-07T05:04:46Z

Do you mean we can have something smart in toolkit that they can know the packages on my local machine.

yeah right now, you can run a command and it might try to apt-install a system package. If the toolkits expose any binaries, or "exit code zero" commands they need, goose could obviate attempts to load that toolkit. OTOH, this won't prevent normal prompts from also attempting the same thing in a toolkit.

Here's pseudocode of what it might look like

--- a/src/goose/toolkit/github.py
+++ b/src/goose/toolkit/github.py
@@ -9,3 +9,7 @@ class Github(Toolkit):
     def system(self) -> str:
         """Retrieve detailed configuration and procedural guidelines for GitHub operations"""
         return Message.load("prompts/github.jinja").text
+
+    def system_prereqs(self) -> Tuple(str):
+        """List of commands that must complete success for this toolkit to operate"""
+        return ("gh", "gh status")

codefromthecrypt · 2024-10-07T22:58:38Z

src/goose/cli/main.py

@@ -114,15 +114,16 @@ def get_session_files() -> dict[str, Path]:
 @click.option("--profile")
 @click.option("--plan", type=click.Path(exists=True))
 @click.option("--log-level", type=click.Choice(["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]), default="INFO")
-def session_start(name: Optional[str], profile: str, log_level: str, plan: Optional[str] = None) -> None:
+@click.option("--safe-mode", is_flag=True, help="Prompt before executing potentially unsafe commands", default=False)


wonder if we can borrow from other tools. I think sometimes, if prompt is default, -Y is overriding it, like apt

-y, --yes, --assume-yes Automatic yes to prompts; assume "yes" as answer to all prompts and run non-interactively. If an undesirable situation, such as changing a held package, trying to install a unauthenticated package or removing an essential package occurs then apt-get will abort. Configuration Item: APT::Get::Assume-Yes. --assume-no Automatic "no" to all prompts. Configuration Item: APT::Get::Assume-No.

Usually, when choosing to name things, I try to borrow from another tool's name that's well used, whether it is vocab or a flag, then you can blame it on them :)

haha, naming is always the hardest part to me. Thank you so much for your advice!

lamchau · 2024-10-09T21:23:34Z

packages/exchange/src/exchange/exchange.py

-            elif isinstance(tool_use.parameters, list):
-                output = json.dumps(tool.function(*tool_use.parameters))
+            if isinstance(tool_use.parameters, dict) or isinstance(tool_use.parameters, list):
+                function_return_value = tool.function(**tool_use.parameters)


is this intentional? i think trying to unpack a list (*) will throw an error since it's positional arguments not keyword arguments

lamchau · 2024-10-09T21:29:12Z

packages/exchange/src/exchange/exchange.py

+            if isinstance(tool_use.parameters, dict) or isinstance(tool_use.parameters, list):
+                function_return_value = tool.function(**tool_use.parameters)
+                if isinstance(function_return_value, dict):
+                    user_decision = function_return_value['user_decision'] if 'user_decision' in function_return_value else None


i feel this is easier to follow if we flatten out the conditionals

if isinstance(parameters, dict): function_return_value = tool.function(**parameters) elif isinstance(parameters, list): function_return_value = tool.function(*parameters) else: raise ValueError( f"The provided tool parameters '{parameters}' " "could not be interpreted as arguments." ) user_decision = None if isinstance(function_return_value, dict): user_decision = function_return_value.get("user_decision")

call_function also still has a lot of responsibility, which makes it hard to reason/test. i'd take it one step further by separating out each chunk

_execute_tool_function(tool, parameters)

_extract_user_decision(function_return_value)

so then the final resulting ToolResult reads like this

function_return_value = _execute_tool_function(tool, tool_use.parameters) user_decision = _extract_user_decision(function_return_value) output = json.dumps(function_return_value)

lamchau · 2024-10-09T21:37:02Z

src/goose/toolkit/developer.py

 RULESTYLE = "bold"
 RULEPREFIX = f"[{RULESTYLE}]───[/] "

+TASKS_WITH_EMOJI = {"planned": "⏳", "complete": "✅", "failed": "❌", "in-progress": "🕑", "cancelled": "🚫", "skipped": "⏩"}


:nit: could we add a trailing comma so this dict prints out each line so it's easier to read?

lamchau · 2024-10-09T21:39:16Z

src/goose/toolkit/developer.py

+            _path.write_text(content)
+            return "Successfully replaced before with after."
+        else:
+            return {"result": "User chooses not to make the change on this file. skip the change", "task_status": "skipped"}


:nit: i like active voice for this User rejected suggested changes to {file}. Skipping edits.

lamchau · 2024-10-09T21:45:59Z

src/goose/utils/safe_mode.py

+    return None
+
+def is_in_safe_mode() -> bool:
+    return os.getenv(GOOSE_SAFE_MODE_ENV, "false") == "true"


just an opinion i think adding "true" "false" clutters the environment. imo we should should only check if it exists or not (e.g. unset GOOSE_SAFE_MODE would disable safe mode).

this minimizes user error

lifeizhou-ap · 2024-10-10T03:48:43Z

Hi @lamchau,

Thanks for the review! The code is a bit unorganised now. At the moment with this draft PR I would like to get the feedback about this feature first, like whether it will be useful or it is worthwhile for experiment. If we think it would be useful, I will continue this PR to polish the code.

lamchau · 2024-10-10T09:51:33Z

@lifeizhou-ap it's worth experimenting with! i currently do this with a .goosehints-dry-run which i need to swap so i can bench/compare results

michaelneale · 2024-10-16T21:25:53Z

I think some people would appreciate this feature, as it can be a surprise how "biased to action" goose is out of the box (I like it myself so likely wouldn't use this mode, but if it isn't adding a lot of complexity to it it is an interesting feature to add in as an option)

in safe-mode, asked the user whether want to continue or cancel befor…

83d044d

…e running shell command

safe mode for patch file and readable output

6fbccd6

lifeizhou-ap changed the title ~~in safe-mode, asked the user whether want to continue or cancel befor…~~ in safe-mode, asked the user whether want to continue or cancel before running shell command Oct 7, 2024

lifeizhou-ap changed the title ~~in safe-mode, asked the user whether want to continue or cancel before running shell command~~ in safe-mode, asked the user whether they want to continue or cancel before any possible changes are made Oct 7, 2024

lifeizhou-ap changed the title ~~in safe-mode, asked the user whether they want to continue or cancel before any possible changes are made~~ feat: In safe-mode, asked the user whether they want to continue or cancel before any possible changes are made Oct 7, 2024

lifeizhou-ap added 5 commits October 7, 2024 17:24

safe mode for write file and diff

5f38b06

safe mode for patch file and readable output

f6f2db3

safe mode for write file and diff

1c8d861

removed the strip in the confirm

ebe07b1

removed unnecessary file

442a6ba

codefromthecrypt reviewed Oct 7, 2024

View reviewed changes

lamchau reviewed Oct 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: In safe-mode, asked the user whether they want to continue or cancel before any possible changes are made #122

feat: In safe-mode, asked the user whether they want to continue or cancel before any possible changes are made #122

lifeizhou-ap commented Oct 7, 2024 •

edited

Loading

codefromthecrypt commented Oct 7, 2024

lifeizhou-ap commented Oct 7, 2024 •

edited

Loading

codefromthecrypt commented Oct 7, 2024

codefromthecrypt Oct 7, 2024

lifeizhou-ap Oct 8, 2024 •

edited

Loading

lamchau Oct 9, 2024

lamchau Oct 9, 2024

lamchau Oct 9, 2024

lamchau Oct 9, 2024

lamchau Oct 9, 2024

lamchau Oct 9, 2024

lifeizhou-ap commented Oct 10, 2024 •

edited

Loading

lamchau commented Oct 10, 2024

michaelneale commented Oct 16, 2024

feat: In safe-mode, asked the user whether they want to continue or cancel before any possible changes are made #122

Are you sure you want to change the base?

feat: In safe-mode, asked the user whether they want to continue or cancel before any possible changes are made #122

Conversation

lifeizhou-ap commented Oct 7, 2024 • edited Loading

codefromthecrypt commented Oct 7, 2024

lifeizhou-ap commented Oct 7, 2024 • edited Loading

codefromthecrypt commented Oct 7, 2024

codefromthecrypt Oct 7, 2024

Choose a reason for hiding this comment

lifeizhou-ap Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

lamchau Oct 9, 2024

Choose a reason for hiding this comment

lamchau Oct 9, 2024

Choose a reason for hiding this comment

lamchau Oct 9, 2024

Choose a reason for hiding this comment

lamchau Oct 9, 2024

Choose a reason for hiding this comment

lamchau Oct 9, 2024

Choose a reason for hiding this comment

lamchau Oct 9, 2024

Choose a reason for hiding this comment

lifeizhou-ap commented Oct 10, 2024 • edited Loading

lamchau commented Oct 10, 2024

michaelneale commented Oct 16, 2024

lifeizhou-ap commented Oct 7, 2024 •

edited

Loading

lifeizhou-ap commented Oct 7, 2024 •

edited

Loading

lifeizhou-ap Oct 8, 2024 •

edited

Loading

lifeizhou-ap commented Oct 10, 2024 •

edited

Loading