An experiment on how well LLMs could score in Lithuanian quiz show "Auksinis Protas"
LLMs were competing in show aired on 2023-09-22, 2023-05-29, 2023-10-28.
LLMs were scored a bit differently than human players:
- Round 1: 1 point for each correct answer
- Round 2: 4 points for a guess with 1 hint, 3 points for a guess with 2 hints, 2 points for a guess with 3 hints, 1 point for a guess with 4 hints
- Round 3: 1 point for each correct answer and no point burning after incorrect guess
- Round 4: 1 point for each correct answer
Show | Round 1 | Round 2 | Round 3 | Round 4 | Total | |
---|---|---|---|---|---|---|
GPT 4 | 2023-09-22 | 6 | 11 | 53 | 1 | 71 |
GPT 3.5 | 2023-09-22 | 3 | 9 | 44 | 1 | 57 |
PaLM 2 | 2023-09-22 | 6 | 11 | 51 | 0 | 68 |
GPT 4 | 2023-05-29 | 6 | 5 | 41 | 2 | 54 |
GPT 3.5 | 2023-05-29 | 3 | 4 | 30 | 0 | 37 |
PaLM 2 | 2023-05-29 | 5 | 8 | 37 | 1 | 51 |
GPT 4 | 2023-10-28 | 8 | 12 | 39 | 2 | 61 |
GPT 3.5 | 2023-10-28 | 4 | 5 | 36 | 1 | 46 |
PaLM 2 | 2023-10-28 | 5 | 24 | 36 | 2 | 67 |
Total
Model | Score |
---|---|
GPT 4 | 186 |
PaLM 2 | 186 |
GPT 3.5 | 140 |
cd game
poetry install
poetry run python play.py -e ../shows/2023-10-28.txt -r 1 -l palm
NOTE:
- To play with GPT-3.5 or GPT-4, OpenAI API key need to be exported before playing:
export OPENAI_API_KEY="..."
- To play with PALM-2 via VertexAI, authentication to Google Cloud is required. For example
gcloud auth application-default login
Detailed instructions on how to download and transcribe a show can be found in Download & Transcribe.