🎭 PersonaChatGen

This is the official github repository for PERSONACHATGEN: Generating Personalized Dialogues using GPT-3.

TL;DR: Recently, many prior works have made their own agents generate more personalized and engaging responses using personachat. However, since this dataset is frozen in 2018, the dialogue agents trained on this dataset would not know how to interact with a human who loves “Wandavision.” One way to alleviate this problem is to create a large-scale dataset. In this work, we introduce the pipeline of creating personachatgen, which is comprised of three main components: Creating (1) profilegen, (2) Persona Set, and (3) personachatgen. To encourage GPT-3’s generation ability, we also defined a taxonomy of hierarchical persona category derived from social profiling taxonomy. To create the speaker consistent persona set, we propose a simple contradiction-based iterative sentence replacement algorithm, named CoNL. Moreover, to prevent GPT-3 generating harmful content, we presented two filtering pipelines, one each for profilegen and personachatgen. Through analyzing of personachatgen, we showed that GPT-3 can generate personalized dialogue containing diverse persona. Furthermore, we revealed a state-of-the-art Blender 90M trained on our dataset that leads to higher performance.

🏆 PersonaChatGen won the Best Paper Award at CCGPK@COLING 2022!

Reference

Use the following to cite our paper:

@inproceedings{lee2022personachatgen,
  title={PERSONACHATGEN: Generating Personalized Dialogues using GPT-3},
  author={Lee, Young-Jun and Lim, Chae-Gyun and Choi, Yunsu and Lm, Ji-Hui and Choi, Ho-Jin},
  booktitle={Proceedings of the 1st Workshop on Customized Chat Grounding Persona and Knowledge},
  pages={29--48},
  year={2022}
}

🔎 ProfileGen

You can now download ProfileGen dataset from the google drive. We provide individual json files, where each file is related to the persona category. (See Table 15, 16, 17 in our paper) Each file contains a list of profile-related sentence generated by GPT-3, where each element in the list consists of sentence, attr, value, and nli_score. Please check a sample data in dataset/profile_sample.json.

🎭 PersonaChatGen

You can now download PersonaChatGen dataset from the google drive. We provide the train and validation sets following the format of the original PersonaChat dataset, as provided by the ParlAI framework. Please check a sample data in dataset/chat_sample.txt.

🤖 How to make PersonaChatGen using GPT-3?

To construct the PersonaChatGen dataset using GPT-3, we propose a pipeline consisting of three stages: (1) ProflieGen Creation, (2) Persona Set Creation, and (3) PersonaChatGen Creation. The detailed information is in our paper. Please follow the below instruction step-by-step.

Preparation

Installation

Install the required set of libraries as follows:

pip install -r requirements.txt

Set up OpenAI API Key

Set up the OpenAI API Key in the function of get_response() in prompt_generator.py as follows:

openai.api_key = "<API_KEY>"
openai.organization = "<ORG_ID>"

ProfileGen Creation

Generation

Run the command below to generate various profile-related sentences using GPT-3.

python profile_main.py

Filtering

Run the command below to filter low-quality sentences based on regex-based filtering, exact matching persona entity, preserving persona category, and duplication filtering.

python profile_filtering.py

Persona Set Creation

Run the command below to create persona sets using our proposed simple algorithm, namely CoNL (Contradiction-based Iterative Sentence Replacement).

🚨 Please note that this algorithm and the accompanying implementation can take a significant amount of time to create numerous persona sets. We encourage other contributors to improve it for greater efficiency.

python conl_main.py

PersonaChatGen Creation

Generation

Run the command below to generate PersonaChatGen dataset using GPT-3.

python chat_main.py

Filtering

Run the command below to filter low-quality dialogues based on copy-paste, persona consistency, toxicity filtering.

python chat_filtering.py

Acknowledgements

This work was supported by the KT Corporation. We thank all KT researchers for helpful discussions.

Have any question?

Please contact Young-Jun Lee at yj2961@kaist.ac.kr or passing2961@gmail.com.

License

This repository is MIT licensed. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎭 PersonaChatGen

Reference

🔎 ProfileGen

🎭 PersonaChatGen

🤖 How to make PersonaChatGen using GPT-3?

Preparation

Installation

Set up OpenAI API Key

ProfileGen Creation

Generation

Filtering

Persona Set Creation

PersonaChatGen Creation

Generation

Filtering

Acknowledgements

Have any question?

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
dataset		dataset
LICENSE		LICENSE
README.md		README.md
chat_filtering.py		chat_filtering.py
chat_main.py		chat_main.py
conl_main.py		conl_main.py
constant.py		constant.py
personachat_prompt.txt		personachat_prompt.txt
profile_filtering.py		profile_filtering.py
profile_main.py		profile_main.py
prompt_generator.py		prompt_generator.py
requirements.txt		requirements.txt

License

passing2961/PersonaChatGen

Folders and files

Latest commit

History

Repository files navigation

🎭 PersonaChatGen

Reference

🔎 ProfileGen

🎭 PersonaChatGen

🤖 How to make PersonaChatGen using GPT-3?

Preparation

Installation

Set up OpenAI API Key

ProfileGen Creation

Generation

Filtering

Persona Set Creation

PersonaChatGen Creation

Generation

Filtering

Acknowledgements

Have any question?

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages