Skip to content

Latest commit

 

History

History
112 lines (87 loc) · 2.95 KB

infer.md

File metadata and controls

112 lines (87 loc) · 2.95 KB

Evaluation

EasyLLM supports interactive dialogue and dataset evaluation, including custom datasets and the following publicly available datasets:

  • Examination: C-Eval/CMMLU
  • Reasoning: HumanEval

Huggingface

  • Interactive conversation
cd scripts
bash hf_infer.sh [partition] [n_gpus] config.yaml interactive

To structure the input prompt format, set the 'infer_tokenization' setting in the configuration file, as illustrated below. (Supported prompt formats are described in the following section):

...

infer_tokenization: 
  type: sense_tokenization
  kwargs:
    max_seq_length: *train_seq_length 
    parser_type: base
    parser_kwargs:
      prompt_type: empty

...
  • Dataset Evaluation
cd scripts
bash hf_infer.sh [partition] [n_gpus] config.yaml eval  

To configure the 'infer_cfg' and 'infer_tokenization' settings in the configuration file, an example of the 'infer_cfg' configuration is provided below:

  • eval_task: Supports four types of dataset evaluation

    • base: Custom dataset
    • ceval|cmmlu|human_eval: Open Dataset
  • question_file: evaluation dataset paths, categorized into four cases based on the eval_task.

    • base: Specify up to the .jsonl level, and refer to the dataset format section for the format details.
    • ceval: OpenCompassData/data/ceval/formal_ceval OpenCompassData donwload link
    • cmmlu: OpenCompassData/data/cmmlu
    • human_eval: no need for specification.
infer_cfg:
  eval_task: [base|ceval|cmmlu|human_eval]
  question_file: questions.jsonl
  result_file: results.jsonl  # evaluation results file path
  generation_cfg:  # inference config
    temperature: 0.2
    top_k: 40
    top_p: 0.9
    do_sample: True
    num_beams: 1
    repetition_penalty: 1.3
    max_new_tokens: 512
  • The format of the evaluation results.
{
   "0"[integer, task_id]: {
      "input": "xxx",                   # input text
      "raw_output": "xxx",              # raw model output
      "output": "xxx",                  # output after post process
      "answer": "xxx",                  # answer labels
      "infos": {                        # infos
          "count": 33,                  # decode count
          "accept_length": 33,          # output token num
          "ave_accept_length": 1        # token per decode
      }
   },
   "1": {
      "input": "xxx",
      "raw_output": "xxx",
      "output": "xxx",
      "answer": "xxx",
      "infos": {
         ...
      }
   },
   ...
}

Megatron

Refer to the above HuggingFace section for the configuration file settings.

  • Interactive conversation
cd scripts
bash mg_infer.sh [partition] [n_gpus] config.yaml interactive
  • Dataset Evaluation
cd scripts
bash mg_infer.sh [partition] [n_gpus] config.yaml eval