update

open-compass · Oct 31, 2023 · ba6f266 · ba6f266
1 parent 0d08596
commit ba6f266
Showing 1 changed file with 0 additions and 14 deletions.
diff --git a/README.md b/README.md
@@ -22,13 +22,6 @@
     - [BotChat Arena](#botchat-arena)
     - [Compared to the "Ground Truth"](#compared-to-the-ground-truth)
     - [Qualitative Analysis](#qualitative-analysis)
-      - [AI Self-Identification](#ai-self-identification)
-      - [Contextual Confusion](#contextual-confusion)
-      - [Excessive Length](#excessive-length)
-      - [Formal Tone](#formal-tone)
-      - [Repetitive Phrasing](#repetitive-phrasing)
-      - [Good case](#good-case)
-      - [More Examples](#more-examples)
   - [Citation](#citation)
   - [OpenCompass Projects](#opencompass-projects)
 
@@ -222,11 +215,6 @@ In the table below, we demonstrate the ELO score (`init=1000, scale=400, K=32`)
 <details>
 <summary><b>Compared to the "Ground Truth"</b></summary>
 
-</details>
-
-<details>
-<summary><b>Compared to the "Ground Truth"</b></summary>
-
 ### Compared to the "Ground Truth"
 
 We further compare the generated conversation with the "Ground Truth" conversations in **MuTual-Test**. We follow the same protocol as BotChat Arena and select a subset with 222 conversations (with at least 4 chats) for this comparison. We list the specific #round distribution of the conversations in the table below. Since the Ground Truth conversations may have various lengths (ranging from 4 to 15), to deliver a fair comparison, we trim all generated conversations to the same length as the reference ground-truth conversation. The meta prompt adopted is basically the same as the one used in BotChat Arena. One difference is that in this time we state that only one of two conversations includes AI-generated utterances.
@@ -249,8 +237,6 @@ We further try to calculate the Uni-Eval pass rate for each conversation at the
 
 </details>
 
-</details>
-
 ### Qualitative Analysis
 In this section, we will conduct qualitative analysis on the results, categorizing bad cases into five distinct types. The specific analysis is as follows: