Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to accommodate OpenLLM leaderboard v2 tasks and change Meta Llama 3.1 to Llama 3.1 #639

Merged
merged 11 commits into from
Oct 1, 2024

Conversation

wukaixingxp
Copy link
Contributor

@wukaixingxp wukaixingxp commented Aug 21, 2024

Previously, the OpenLLM leaderboard V1 is hard to reproduce as it did not included a easy commend to run, so we created a customized eval.py to load a folder of selected task templates to run OpenLLM leaderboard V1. Now, OpenLLM leaderboard V2 has not only updated the selected tasks, but created a easy-to-use commend for us to run. Therefore, this PR deletes the previous work for OpenLLM leaderboard V1 and updates the README to accommodate OpenLLM leaderboardv2 tasks

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Thanks for contributing 🎉!

@wukaixingxp wukaixingxp self-assigned this Aug 22, 2024

As Meta Llama models gain popularity, evaluating these models has become increasingly important. We have released all the evaluation details for Meta-Llama 3.1 models as datasets in the [3.1 evals Hugging Face collection](https://huggingface.co/collections/meta-llama/llama-31-evals-66a2c5a14c2093e58298ac7f). This recipe demonstrates how to closely reproduce the Llama 3.1 reported benchmark numbers using the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main) library and our prompts from the 3.1 evals datasets on selected tasks.
As Meta Llama models gain popularity, evaluating these models has become increasingly important. We have released all the evaluation details for Meta-Llama 3.1 models as datasets in the [3.1 evals Hugging Face collection](https://huggingface.co/collections/meta-llama/llama-31-evals-66a2c5a14c2093e58298ac7f). This recipe demonstrates how to calculate the Llama 3.1 reported benchmark numbers using the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main) library and our prompts from the 3.1 evals datasets on selected tasks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clarify here that we are providing this since our internal repo isn't public but that this will not reproduce exactly the evals. It is just meant to be an aid for someone wanting to use the dataset we released.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, this line has been added.


## Disclaimer


1. **This recipe is not the official implementation** of Meta Llama evaluation. It is based on public third-party libraries, as this implementation is not mirroring Meta Llama evaluation, this may lead to minor differences in the reproduced numbers.
1. **This recipe is not the official implementation** of Meta Llama evaluation. It is based on public third-party libraries, as this implementation is not mirroring Meta Llama evaluation, this may lead to minor differences in the produced numbers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Se comment above. I think this point addresses my comment partially.

@wukaixingxp wukaixingxp changed the title Updated the readme in the llm_eval_harness to accommodate OpenLLM leaderboard v2 tasks Updates to accommodate OpenLLM leaderboard v2 tasks and change Meta Llama 3.1 to Llama 3.1 Oct 1, 2024
@wukaixingxp wukaixingxp merged commit 2501f51 into main Oct 1, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants