TopicLLM_Granularity_Hallucination

Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling

#trl setup: https://huggingface.co/docs/trl/example_overview
python == 3.10.9
torch == 2.1.2+cu121
transformers == 4.37.0.dev0
trl == 0.7.9

set conda env:

conda env create -f topicllm.yml

set Accelerate config file:

#to use Huggingface TRL, you’ll need to generate an Accelerate config file
#also see: https://huggingface.co/docs/trl/example_overview
accelerate config

Develop DPO Training Sample (i.e., Re-construction Pipeline)

(/GateNLP/TopicLLM_Granularity_Hallucination/blob/main/Create%20DPO%20Training%20Samples%20Pipeline.ipynb)

DPO Mistral 7B:

# code from Huggingface TRL
CUDA_VISIBLE_DEVICES=[your_device] accelerate launch finetune_Mistral7b.py
    --model_name_or_path="mistralai/Mistral-7B-Instruct-v0.1"
    --output_dir="mistral_new_Adapter"

Merge Adapter Mistral 7B:

# code from Huggingface TRL
CUDA_VISIBLE_DEVICES=[your_device] python merge_peft_adapter.py
    --base_model_name="mistralai/Mistral-7B-Instruct-v0.1"
    --adapter_model_name="mistral_new_Adapter"
    --output_dir="mistral_new_checkpoint"

To extract topics via LLMs:

# Please note that more than one GPU graphics card may be required to run LLaMA 13B models!
CUDA_VISIBLE_DEVICES = "Your_GPU" python3 Seed_Topic_Dynamic.py

To evaluate topics generated by LLMs

python3 topic_evaluation.py

Bibtex:

@article{mu2024addressing,
  title={Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling},
  author={Mu, Yida and Bai, Peizhen and Bontcheva, Kalina and Song, Xingyi},
  journal={arXiv preprint arXiv:2405.00611},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
data		data
raw_topics		raw_topics
Create DPO Training Samples Pipeline.ipynb		Create DPO Training Samples Pipeline.ipynb
README.md		README.md
Seed_Topic_Dynamic.py		Seed_Topic_Dynamic.py
dpo_finetuning_samples.csv		dpo_finetuning_samples.csv
finetune_Mistral7b.py		finetune_Mistral7b.py
merge_peft_adapter.py		merge_peft_adapter.py
topic_evaluation.py		topic_evaluation.py
topicllm.yml		topicllm.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TopicLLM_Granularity_Hallucination

set conda env:

set Accelerate config file:

Develop DPO Training Sample (i.e., Re-construction Pipeline)

DPO Mistral 7B:

Merge Adapter Mistral 7B:

To extract topics via LLMs:

To evaluate topics generated by LLMs

Bibtex:

About

Releases

Packages

Languages

GateNLP/TopicLLM_Granularity_Hallucination

Folders and files

Latest commit

History

Repository files navigation

TopicLLM_Granularity_Hallucination

set conda env:

set Accelerate config file:

Develop DPO Training Sample (i.e., Re-construction Pipeline)

DPO Mistral 7B:

Merge Adapter Mistral 7B:

To extract topics via LLMs:

To evaluate topics generated by LLMs

Bibtex:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages