chitchat-dataset

Open-domain conversational dataset from the BYU Perception, Control & Cognition lab's Chit-Chat Challenge.

install

pip3 install chitchat_dataset

or simply download the raw dataset:

curl -LO https://raw.githubusercontent.com/BYU-PCCL/chitchat-dataset/master/chitchat_dataset/dataset.json

usage

More formal docs should be coming soon, but for now, see chitchat_dataset/__init__.py for more options.

import chitchat_dataset as ccc

dataset = ccc.Dataset()

# Dataset is a subclass of dict()
for convo_id, convo in dataset.items():
    print(convo_id, convo)

Or get the messages in a flat list:

messages = list(ccc.MessageDataset())

See examples/ for other languages.

stats

7,168 conversations
258,145 utterances
1,315 unique participants

format

The dataset is a mapping from conversation UUID to a conversation:

{
  "prompt": "What's the most interesting thing you've learned recently?",
  "ratings": { "witty": "1", "int": 5, "upbeat": 5 },
  "start": "2018-04-20T01:57:41",
  "messages": [
    [
      {
        "text": "Hello",
        "timestamp": "2018-04-19T19:57:51",
        "sender": "22578ac2-6317-44d5-8052-0a59076e0b96"
      }
    ],
    [
      {
        "text": "I learned that the Queen of England's last corgi died",
        "timestamp": "2018-04-19T19:58:14",
        "sender": "bebad07e-15df-48c3-a04f-67db828503e3"
      }
    ],
    [
      {
        "text": "Wow that sounds so sad",
        "timestamp": "2018-04-19T19:58:18",
        "sender": "22578ac2-6317-44d5-8052-0a59076e0b96"
      },
      {
        "text": "was it a cardigan welsh corgi",
        "timestamp": "2018-04-19T19:58:22",
        "sender": "22578ac2-6317-44d5-8052-0a59076e0b96"
      },
      {
        "text": "?",
        "timestamp": "2018-04-19T19:58:24",
        "sender": "22578ac2-6317-44d5-8052-0a59076e0b96"
      }
    ]
  ]
}

This makes it convenient to represent multi-message conversational turns etc., preserving the structure/flow of the conversation.

how to cite

If you extend or use this work, please cite the paper where it was introduced:

@article{myers2020conversational,
  title={Conversational Scaffolding: An Analogy-Based Approach to Response Prioritization in Open-Domain Dialogs},
  author={Myers, Will and Etchart, Tyler and Fulda, Nancy},
  year={2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github/workflows		.github/workflows
chitchat_dataset		chitchat_dataset
examples		examples
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chitchat-dataset

install

usage

stats

format

how to cite

About

Releases

Packages

Languages

License

BYU-PCCL/chitchat-dataset

Folders and files

Latest commit

History

Repository files navigation

chitchat-dataset

install

usage

stats

format

how to cite

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages