CoVaxxy: A collection of English-language Twitter posts about COVID-19 vaccines

This project, conducted by Indiana University’s Observatory on Social Media (OSoMe) in collaboration with colleagues from Politecnico di Milano, aims to track and investigate how online information impacts COVID-19 vaccine uptake and health outcomes. We offer public access to a large collection of vaccine-related tweets that are gathered in real-time and updated daily (see our data collection paper for more details). We combine this with vaccine uptake and survey data to create the CoVaxxy dashboard, a web-page that allows anyone to visualize descriptive statistics and preliminary results,

Twitter Data

An on-going collection of English-language Twitter posts about COVID-19 vaccines is available here.

To create as complete a set of Twitter posts related to COVID-19 vaccines as possible, we carefully select a list of keywords through a snowball sampling technique. We start with the two most relevant keywords, i.e., covid and vaccine, as our initial seeds. Next, we gather tweets utilizing the filtered stream endpoint of the Twitter API for three hours. From these gathered tweets, we then identify potential keywords that frequently co-occur with the seeds, adding them to our seed list only after manually ensuring they are closely related to our topic. This process was repeated six times between Dec. 15, 2020 and Jan. 2, 2021 with each iteration's data collection taking place at different times of the day to capture tweets from different geographic areas and demographics. The seed list serves as our initial keyword list. We further refine the keyword list by manually combining certain keywords into composites (e.g.covid19 pfizer), as a way to ensure that the dataset is broad enough to include most relevant (English) conversations while excluding tweets that are not related to the vaccine discussion.

Some notes on the query syntax of Twitter's filtered stream API:

Queries that include keywords also match hashtags, URLs, and substrings. For example, covid matches cnn.com/covid and #covid19.
Using covid19 pfizer as a composite matching phrase will capture tweets that contain covid19 and pfizer. On the other hand, including covid19, pfizer as separate keywords will capture tweets that contain covid19 or pfizer.

Iffy+

To categorize tweets as low credibility, we utilize the Iffy+ Misinfo/Disinfo list created by Iffy.news. As stated on the Iffy+ page, "Iffy+ merges lists of sites that regularly publish mis/disinformation, as identified by major fact-checking and journalism organizations, into a single dataset." Please checkout the description of the list for more information.

Paper

More details on the data collection can be found in our paper describing the collection of English-language Twitter posts about COVID-19 vaccines:

Matthew R. DeVerna, Francesco Pierri, Bao Truong, John Bollenbacher, David Axelrod, Niklas Loynes, Cristopher Torres-Lugo, Kai-Cheng Yang, Fil Menczer, and John Bryden (2021) "CoVaxxy: A collection of English Twitter posts about COVID-19 vaccines." in Proceedings of the 15th International Conference on Web and Social Media. (link to paper)

If you use this data, please cite this reference paper.

Dashboard

We have developed a live dashboard to allow people to visualize descriptive statistics and preliminary results. It is available here: https://osome.iu.edu/tools/covaxxy

Complimentary data sources used by the CoVaxxy dashboard:

Vaccination data from the Centers for Disease Control and Prevention data found here, as compiled by Our World in Data here.
Vaccine acceptance and refusal data from Carnegie Mellon University's Delphi Epidata API survey data, created by the Delphi Research Group.

VaccinItaly:

A member of the CoVaxxy team, Francesco Pierri, has also developed the VaccinItaly dashboard which is similar to CoVaxxy. This dashboard, however, specifically monitors Italian conversations around vaccines on Facebook and Twitter.

Team

Matthew R. DeVerna, Bao Truong, John Bollenbacher, David Axelrod, Cristopher Torres-Lugo, Kai-Cheng Yang, Fil Menczer, John Bryden (Observatory on Social Media, Indiana University)
Francesco Pierri (Department of Electronics, Information and Bioengineering, Politecnico di Milano)
Niklas Loynes (School of Social Sciences, University of Manchester)

Acknowledgments

This project is supported in part by the Knight Foundation and Craig Newmark Philanthropies. We used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
README.md		README.md
high_credibility_websites.csv		high_credibility_websites.csv
low_credibility_websites.csv		low_credibility_websites.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoVaxxy: A collection of English-language Twitter posts about COVID-19 vaccines

Twitter Data

Iffy+

Paper

Dashboard

Team

Acknowledgments

About

Releases

Packages

Contributors 6

osome-iu/CoVaxxy

Folders and files

Latest commit

History

Repository files navigation

CoVaxxy: A collection of English-language Twitter posts about COVID-19 vaccines

Twitter Data

Iffy+

Paper

Dashboard

Team

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Packages