Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
frankweijue committed Oct 31, 2023
1 parent 0d08596 commit ba6f266
Showing 1 changed file with 0 additions and 14 deletions.
14 changes: 0 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,6 @@
- [BotChat Arena](#botchat-arena)
- [Compared to the "Ground Truth"](#compared-to-the-ground-truth)
- [Qualitative Analysis](#qualitative-analysis)
- [AI Self-Identification](#ai-self-identification)
- [Contextual Confusion](#contextual-confusion)
- [Excessive Length](#excessive-length)
- [Formal Tone](#formal-tone)
- [Repetitive Phrasing](#repetitive-phrasing)
- [Good case](#good-case)
- [More Examples](#more-examples)
- [Citation](#citation)
- [OpenCompass Projects](#opencompass-projects)

Expand Down Expand Up @@ -222,11 +215,6 @@ In the table below, we demonstrate the ELO score (`init=1000, scale=400, K=32`)
<details>
<summary><b>Compared to the "Ground Truth"</b></summary>

</details>

<details>
<summary><b>Compared to the "Ground Truth"</b></summary>

### Compared to the "Ground Truth"

We further compare the generated conversation with the "Ground Truth" conversations in **MuTual-Test**. We follow the same protocol as BotChat Arena and select a subset with 222 conversations (with at least 4 chats) for this comparison. We list the specific #round distribution of the conversations in the table below. Since the Ground Truth conversations may have various lengths (ranging from 4 to 15), to deliver a fair comparison, we trim all generated conversations to the same length as the reference ground-truth conversation. The meta prompt adopted is basically the same as the one used in BotChat Arena. One difference is that in this time we state that only one of two conversations includes AI-generated utterances.
Expand All @@ -249,8 +237,6 @@ We further try to calculate the Uni-Eval pass rate for each conversation at the

</details>

</details>

### Qualitative Analysis
In this section, we will conduct qualitative analysis on the results, categorizing bad cases into five distinct types. The specific analysis is as follows:

Expand Down

0 comments on commit ba6f266

Please sign in to comment.