Skip to content

Marker-Inc-Korea/Logickor-Gemma2-Eval

Repository files navigation

Logickor-Gemma2-Eval

This repo was created internally to utilize the 🌟logickor🌟 evaluation for self-evaluation.
Maybe, our code is same manner as logickor v2.

Our code provides zero-shot code only. (New update; 09. 07) We add 1-shot and cot-1-shot.

Gukbap-Mistral-7B🍚 (6.06): Hugging Face
Gukbap-Qwen2-7B🍚 (6.70): Hugging Face
Gukbap-Gemma2-9B🍚 (8.77): Hugging Face

Dependency (important)

There are many issues with evaluating Gemma2 in vllm.
Therefore, you should follow the installation below.

  1. Download vllm 0.5.1 version.
pip install vllm==0.5.1
  1. Add FLASHINFER backend in your script file.
export VLLM_ATTENTION_BACKEND=FLASHINFER
  1. And then, download flashinfer package through this link.
  • If there are some error, then try: solution2.

Evaluation (zero-shot)

Please check the script file.

# Example
export VLLM_ATTENTION_BACKEND=FLASHINFER 

python mtbench.py \
    --is_multi_turn 1 \
    --eval_model gpt-4-1106-preview \
    --repo_name HumanF-MarkrAI \ 
    --base_model Gukbap-Gemma2-9B \ 
    --max_token 4096

If you want to test other models (mistral, qwen, ...), then you need to remove export VLLM_ATTENTION_BACKEND=FLASHINFER. If you test the Gemma2 models, you need to set max_token < 8192. Cuurently, vllm cannot apply 8192 token with Gemma2.

Evaluation (1-shot)

Please check the script file.

export VLLM_ATTENTION_BACKEND=FLASHINFER

python 1_shot_mtbench.py \
    --is_multi_turn 1 \
    --eval_model gpt-4-1106-preview \
    --repo_name HumanF-MarkrAI \
    --base_model Gukbap-Gemma2-9B \
    --max_token 4096 \
    --prompt cot-1-shot # You select [cot-1-shot or 1-shot]

Gemma2 do not support system prompt.

Example

BibTex

@article{HumanF-MarkrAI,
  title={Gukbap-Series-LLM},
  author={MarkrAI},
  year={2024},
  url={https://huggingface.co/HumanF-MarkrAI}
}

About

Logickor self-evaluation code (with gemma2)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published