Skip to content

Found out that using A100 and V100 on Vicuna and Llama2 have a different result, while other model such as Falcon doesn't has such question.

Notifications You must be signed in to change notification settings

Paulyang80/LLMEvaluation-A100-vs-V100-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LLMEvaluation

Found out that using A100 and V100 on Vicuna and Llama2 have a different result, while other model such as Falcon doesn't has such question. The results are here. image

Experiment

Running the experiment on Google Colab Pro +

Model Evaluation

We use four LLM benchmarks to evaluate the model.

  1. Hellaswag: acc
  2. Truthfulqa_mc: mc1, mc2
  3. Arc_challenge: acc
  4. MMLU(HendrycksTest): Average score of all test acc.

About

Found out that using A100 and V100 on Vicuna and Llama2 have a different result, while other model such as Falcon doesn't has such question.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published