This repository contains scripts and configurations for our paper "The Best of Both Worlds: Toward an Honest and Helpful Large Language Model".
This repository focuses on enhancing the honesty and helpfulness of Large Language Models (LLMs) in real-world applications. Our work introduces novel methodologies and datasets to evaluate and improve the reliability of LLMs.
-
HoneSet Dataset: A novel dataset containing 930 queries across six categories, crafted to evaluate the honesty of LLMs.
-
Two Enhancement Approaches:
- Training-Free Enhancement: Leverages curiosity-driven prompting to help LLMs express uncertainty and refine their responses.
- Fine-Tuning-Based Improvement: Utilizes a curriculum learning inspired two-stage process to teach LLMs to differentiate between honest and dishonest responses, followed by a phase to boost their helpfulness.
- Honeset is located in
dataset/HoneSet.json
which contains 930 data items across 6 categories as follows:
Category |
---|
Latest Information with External Services |
User Input Not Enough Or With Wrong Information |
Self Identity Cognition |
Modality Mismatch |
Professional Capability in Specific Domain |
Interactivity Sensory Processing |
- Python 3.x
- Libraries: openai, replicate, requests, tenacity, concurrent.futures, anthropic, torch, yaml, argparse, dotenv
- API keys and model mappings for Azure, replicate, deepinfra and other services.
- Edit Configuration:
- Navigate to the
training_free/config.yaml
file. - Replace your API key and any other necessary configurations within this file.
- Navigate to the
- Script Location:
- Ensure that you are in the directory containing the
training_free.sh
script.
- Ensure that you are in the directory containing the
- Set Model Parameters:
-
model_type
can beonline
orlocal
-
model_name
can be as follows:Model_name input Model gpt-4 GPT-4 chatgpt ChatGPT claude Claude3-Opus llama3-70b Llama3-70b llama3-8b Llama3-8b mixtral-8x7b Mixtral-8x7b llama2-7b Llama2-7b llama2-13b Llama2-13b llama2-70b Llama2-70b mistral-7b Mistral-7b
-
- Online Mode
When running the script in
online
mode, use the following parameters:
./training_free.sh online [model_name]
- local Mode
When running the script in
local
mode, you can specify additional parameters: --temperature
(default = 0): Controls the randomness of the response generation. Higher values produce more varied outputs.--repetition_penalty
(default = 1.0): Penalizes repetition to encourage more diverse responses.--num_gpus
(default = 1): Specifies the number of GPUs to use.--max_length
(default = 2048): Limits the number of tokens in the response.--debug
(default = false): Enables debug mode for more verbose output.--model_path
(default = ''): The path to the model files, necessary in local mode.--filename
(default = ''): Specifies the output filename.--test_type
(default = 'plugin'): Sets the type of testing or processing.--online
(default = 'False'): Indicates whether to run the model in online mode.
./training_free.sh local [model_name] --temperature [value] --repetition_penalty [value] --num_gpus [value] --max_length [value] --debug --model_path [path] --filename [filename] --test_type [type]
This repository contains scripts and configurations for fine-tuning, merging, and running inference with Llama models using LLaMA-Factory
.
LLaMA-Factory
installed- Install LLaMA-Factory
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e .[torch,metrics]
To fine-tune the model, use the following command:
llamafactory-cli train train_config.yaml
Replace train_config.yaml
with one setting in finetuning/*.yaml
After fine-tuning, merge the stage 1 model using:
llamafactory-cli export merge_lora_dpo.yaml
Make sure merge_lora_dpo.yaml
is configured with the appropriate merging parameters.
To run model inference, execute:
llamafactory-cli api model_inference.yaml
Ensure model_inference.yaml
contains the correct inference settings.
@misc{gao2024best,
title={The Best of Both Worlds: Toward an Honest and Helpful Large Language Model},
author={Chujie Gao and Qihui Zhang and Dongping Chen and Yue Huang and Siyuan Wu and Zhengyan Fu and Yao Wan and Xiangliang Zhang and Lichao Sun},
year={2024},
eprint={2406.00380},
archivePrefix={arXiv},
primaryClass={cs.CL}
}