Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(autofix): Switch to claude for root cause step #1384

Merged
merged 2 commits into from
Nov 6, 2024

Conversation

roaga
Copy link
Member

@roaga roaga commented Nov 5, 2024

Model switch from GPT 4o to Claude 3.5 Haiku results in better latency, better coding, and better root cause results.

(see row 2 below):

Screenshot 2024-11-05 at 6 46 30 PM

@roaga roaga requested a review from a team as a code owner November 5, 2024 23:30
@roaga roaga requested a review from jennmueng November 5, 2024 23:30
@trillville
Copy link
Contributor

how much higher is the error rate? hard to tell from these metrics

@roaga
Copy link
Member Author

roaga commented Nov 5, 2024

how much higher is the error rate? hard to tell from these metrics

Good runs: 98
Errored runs: 5
Error rate: 0.04
Errored in root cause: 0
Errored in plan: 3
Error rate in root cause: 0.00
Error rate in plan: 0.02
Error rate in something after plan: 0.02
Runs with unapplied changes: 24
Missing change rate: 0.19

These are the correct numbers, the eval script is bugged @trillville

So actually error rate is unaffected

Copy link
Member

@jennmueng jennmueng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug in our eval, but the notebook error rate is correct

@roaga
Copy link
Member Author

roaga commented Nov 6, 2024

actually going for haiku now instead of sonnet, evals are better

@roaga roaga merged commit 1e0d432 into main Nov 6, 2024
5 checks passed
@roaga roaga deleted the autofix/switch-to-claude branch November 6, 2024 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants