Updates to accommodate OpenLLM leaderboard v2 tasks and change Meta Llama 3.1 to Llama 3.1 #639

wukaixingxp · 2024-08-21T22:18:20Z

Previously, the OpenLLM leaderboard V1 is hard to reproduce as it did not included a easy commend to run, so we created a customized eval.py to load a folder of selected task templates to run OpenLLM leaderboard V1. Now, OpenLLM leaderboard V2 has not only updated the selected tasks, but created a easy-to-use commend for us to run. Therefore, this PR deletes the previous work for OpenLLM leaderboard V1 and updates the README to accommodate OpenLLM leaderboardv2 tasks

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Thanks for contributing 🎉!

rohit-ptl · 2024-09-03T18:25:35Z

tools/benchmarks/llm_eval_harness/meta_eval/README.md


-As Meta Llama models gain popularity, evaluating these models has become increasingly important. We have released all the evaluation details for Meta-Llama 3.1 models as datasets in the [3.1 evals Hugging Face collection](https://huggingface.co/collections/meta-llama/llama-31-evals-66a2c5a14c2093e58298ac7f). This recipe demonstrates how to closely reproduce the Llama 3.1 reported benchmark numbers using the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main) library and our prompts from the 3.1 evals datasets on selected tasks.
+As Meta Llama models gain popularity, evaluating these models has become increasingly important. We have released all the evaluation details for Meta-Llama 3.1 models as datasets in the [3.1 evals Hugging Face collection](https://huggingface.co/collections/meta-llama/llama-31-evals-66a2c5a14c2093e58298ac7f). This recipe demonstrates how to calculate the Llama 3.1 reported benchmark numbers using the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main) library and our prompts from the 3.1 evals datasets on selected tasks.


Let's clarify here that we are providing this since our internal repo isn't public but that this will not reproduce exactly the evals. It is just meant to be an aid for someone wanting to use the dataset we released.

sure, this line has been added.

rohit-ptl · 2024-09-03T18:26:06Z

tools/benchmarks/llm_eval_harness/meta_eval/README.md


 ## Disclaimer


-1. **This recipe is not the official implementation** of Meta Llama evaluation. It is based on public third-party libraries, as this implementation is not mirroring Meta Llama evaluation, this may lead to minor differences in the reproduced numbers.
+1. **This recipe is not the official implementation** of Meta Llama evaluation. It is based on public third-party libraries, as this implementation is not mirroring Meta Llama evaluation, this may lead to minor differences in the produced numbers.


Se comment above. I think this point addresses my comment partially.

tools/benchmarks/llm_eval_harness/meta_eval/README.md

wukaixingxp added 2 commits August 21, 2024 14:55

removed previous v1 folder and updated leadboard v2 readme

717d5c2

minor fix

4c9841e

wukaixingxp requested a review from HamidShojanazeri August 21, 2024 22:18

facebook-github-bot added the cla signed label Aug 21, 2024

wukaixingxp added 4 commits August 21, 2024 15:28

fix typo and readme

2f0a006

add wordlist and fix deadlink

9f4fb4b

minor fix

ebb7b4e

fix 403 dead link

a7b4492

wukaixingxp self-assigned this Aug 22, 2024

update readme

576e574

HamidShojanazeri approved these changes Aug 29, 2024

View reviewed changes

rohit-ptl reviewed Sep 3, 2024

View reviewed changes

tools/benchmarks/llm_eval_harness/meta_eval/README.md Outdated Show resolved Hide resolved

wukaixingxp added 4 commits October 1, 2024 11:09

merged from main

b013d27

remove result section and change meta-llama 3.1 to llama 3.1

e1b7bc7

fix readme

8443489

small fix

8176c35

wukaixingxp changed the title ~~Updated the readme in the llm_eval_harness to accommodate OpenLLM leaderboard v2 tasks~~ Updates to accommodate OpenLLM leaderboard v2 tasks and change Meta Llama 3.1 to Llama 3.1 Oct 1, 2024

wukaixingxp merged commit 2501f51 into main Oct 1, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates to accommodate OpenLLM leaderboard v2 tasks and change Meta Llama 3.1 to Llama 3.1 #639

Updates to accommodate OpenLLM leaderboard v2 tasks and change Meta Llama 3.1 to Llama 3.1 #639

wukaixingxp commented Aug 21, 2024 •

edited

Loading

rohit-ptl Sep 3, 2024

wukaixingxp Oct 1, 2024

rohit-ptl Sep 3, 2024


		As Meta Llama models gain popularity, evaluating these models has become increasingly important. We have released all the evaluation details for Meta-Llama 3.1 models as datasets in the [3.1 evals Hugging Face collection](https://huggingface.co/collections/meta-llama/llama-31-evals-66a2c5a14c2093e58298ac7f). This recipe demonstrates how to closely reproduce the Llama 3.1 reported benchmark numbers using the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main) library and our prompts from the 3.1 evals datasets on selected tasks.
		As Meta Llama models gain popularity, evaluating these models has become increasingly important. We have released all the evaluation details for Meta-Llama 3.1 models as datasets in the [3.1 evals Hugging Face collection](https://huggingface.co/collections/meta-llama/llama-31-evals-66a2c5a14c2093e58298ac7f). This recipe demonstrates how to calculate the Llama 3.1 reported benchmark numbers using the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main) library and our prompts from the 3.1 evals datasets on selected tasks.

Updates to accommodate OpenLLM leaderboard v2 tasks and change Meta Llama 3.1 to Llama 3.1 #639

Updates to accommodate OpenLLM leaderboard v2 tasks and change Meta Llama 3.1 to Llama 3.1 #639

Conversation

wukaixingxp commented Aug 21, 2024 • edited Loading

Before submitting

rohit-ptl Sep 3, 2024

Choose a reason for hiding this comment

wukaixingxp Oct 1, 2024

Choose a reason for hiding this comment

rohit-ptl Sep 3, 2024

Choose a reason for hiding this comment

wukaixingxp commented Aug 21, 2024 •

edited

Loading