Skip to content

Commit

Permalink
results: improve explanation of figure 9
Browse files Browse the repository at this point in the history
  • Loading branch information
miltondp committed May 23, 2024
1 parent bd23701 commit 161f225
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion content/04.results.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,11 +221,12 @@ Although these are important future directions, neither statement accurately des

![
**Automated assessment of preference over revised paragraphs.**
A revision score ($y$-axis) close to 1 indicates that the LLM acting as a judge preferred the revised paragraph over the original one, while a score of -1 indicates the opposite; a zero value indicates either a tie or position bias.
A revision score ($y$-axis) close to 1 indicates that the LLM acting as a judge preferred the revised paragraph over the original one, while a score of -1 indicates the opposite; a score close to zero indicates either a tie or position bias.
Each point represents the average score of paragraphs from a section in one of the four manuscripts: CCC, PhenoPLIER, BioChatter and Epistasis.
](images/llm_judge.svg "Automated assessments using LLM-as-a-Judge"){#fig:llm_judge width="90%"}

The automatic assessment of paragraphs from different sections across four manuscripts is depicted in Figure @fig:llm_judge.
A revision score above zero indicates that the LLM acting as a judge preferred the revised paragraph over the original one on average, while a score below zero indicates the opposite.
It can be seen that the two models used as judges, GPT-3.5 Turbo and GPT-4 Turbo, generally agreed and favored the revised paragraphs over the original ones (revision score above zero) in most cases.
The only section where the original paragraphs were clearly preferred was the Abstract of the PhenoPLIER and Epistasis manuscripts.
GPT-3.5 Turbo showed a preference for the original abstract of PhenoPLIER in most cases, and the model rationale (Supplementary File 5) aligns with our human assessment: the original abstract provides a more "detailed explanation" of the approaches and a "comprehensive overview of the research."

0 comments on commit 161f225

Please sign in to comment.