Repository for the paper "Understanding the Vegetable Oil Debate and Its Implications for Sustainability through Social Media". https://arxiv.org/abs/2308.07108
The global production and consumption of vegetable oils have sparked several discussions on sustainable development. This study analyzes over 20 million tweets related to vegetable oils to explore the key factors shaping public opinion. We found that coconut, olive, and palm oils dominate social media discourse despite their lower contribution to overall global vegetable production. The discussion about olive and palm oils remarkably correlates with Twitter's growth, while coconut increases more significantly with bursts of activity. Discussions around coconut and olive oils primarily focus on health, beauty, and food, while palm draws attention to pressing environmental concerns. Overall, virality is related to environmental issues and negative connotations. In the context of Sustainable Development Goals, this study highlights the multifaceted nature of the vegetable oil debate and its disconnection from scientific discussions. Our research sheds light on the power of social media in shaping public perception, providing insights into sustainable development strategies.
Clone this repository with the command
git clone https://github.com/elenacandellone/vegetable-oils.git
Install the required packages
conda env create -f environment.yml
data_id
contains all the tweets ID used in this paper. Twitter's Developer Agreement Policy does not allow users with Academic Research access to deposit the data in any public repository, but it allows them to share the ID of the tweets and the users analyzed.plots
contains the plots generated by each script.processed
contains the data used to create each plot.src
contains all the functions used in the scripts.
The main analyses were carried out in Python, but the figures were created using R. The analyses scripts are:
data-processing.py
takes the raw data (here not provided) and saves information about users, dates, geolocation, user ids, metrics, text separated. Only the tweets' ids are provided in folderdata_id
.LSA.ipynb
performs the Latent Semantic Analysis of the tweets.sentiment.py
performs the Sentiment Analysis using the model "cardiffnlp/twitter-roberta-base-sentiment"virality.ipynb
performs the Cascade Size (CS) and Inter-Event Time (IET) analysis.
To create the plots the R scripts are:
-
zcript_Fig1.R
Anatomy of the vegetable oils presence in Twitter: tweets sent from 3/21/2006 to 12/31/2021 containing the bigram type oil, where type corresponds to any of the oils listed in the legends. (a) The cumulative number of tweets for each oil. There are three major oils in terms of social media presence: olive, coconut, and palm oils. (b) The monthly number of tweets for each of the three major vegetable oils. (c) Growth relative to 2007, measured as the total number of tweets in a given year over the number of tweets in 2007. We compare the growth of tweets on each vegetable oil with the growth of the 100 most common words in English as a proxy for the growth of Twitter (see Methods). Error bars indicate the standard deviation of the growth for the common words. (d) The number of geo-tagged tweets per country containing the bigram "palm oil" in 102 languages (see Methods). The total number of tweets collected in any language is 7,946,915, of which only 64,682 are geo-tagged. -
zcript_Fig2.R
The debate around vegetable oils: the topics associated with each vegetable oil can vary greatly. Panels a-c show the top 10 most used hashtags for each set of tweets, expressed as the percentage of tweets containing the hashtag, for coconut, olive, and palm oil, respectively. Palm oil is mainly related to environmental issues, while coconut and olive are predominantly associated with nutrition and health topics. Panel d reports the results of a Latent Semantic Analysis (LSA) applied to the set of tweets (see Methods). Each point in the plot represents a tweet, with color indicating the vegetable oil it is associated with. The closer the two points are in space, the more similar their topics are. Lastly, panel e depicts a word cloud of the top 2,000 hashtags in the palm oil dataset. Font size is proportional to the number of occurrences in the dataset so that the least used hashtags are hardly visible. (note that this panel was created using the jupyter notebook zcript_2E.ipynb and manually inserted in the plot) -
zcript_Fig3.R
Sentiment Analysis: Panels a-c contain the fraction of tweets associated with each sentiment for each of the three datasets under consideration (coconut, olive, and palm). In the three oil cases, roughly 50% of the tweets are considered neutral, but the situation is completely different for negative ones. The datasets of coconut and olive oils contain about 10% of tweets labeled as negative, while this number increases to 42% for palm oil. Panels d (coconut), e (olive), and f (palm) show the monthly number of tweets associated with each sentiment. The share of tweets in each category is fairly constant, except for a few viral events, such as the one related to coconut oil in March 2019 and the one associated with palm oil in November 2018. -
zcript_Fig4.R
The palm oil debate in 2018: Panel (a) displays the weekly count of tweets containing the keywords "Biodiversity", "Iceland Foods", "IUCN", and "Orangutan", illustrating the varying public attention to different facets of the issue. This showcases two significant but differently echoed events: the IUCN report on palm oil's impact on biodiversity in late June and the widely noticed campaign against palm oil by Iceland Foods and Greenpeace in Christmas. Panel (b) visualizes the sentiment (positive, neutral, negative) associated with tweets containing the terms "IUCN" or "biodiversity", demonstrating the emotional response to scientific discussions. Panel (c) presents the sentiment associated with tweets containing the terms "Orangutan" or "Iceland Foods", reflecting the emotional impact of the viral campaign. -
zcript_Fig5.R
Virality phase diagram: Each point represents one of the 10 most common hashtags in the coconut (orange circles), olive (blue triangles), and palm (green squares) datasets. The spatial coordinates are the scaling exponents of the interevent time (x-axis) and the cascade size (y-axis) distributions. Gray lines delimit the areas defined by the critical exponents (see Methods). Pie charts show the distribution of positive, neutral, and negative tweets with at least one hashtag in each of these areas.
They were tested with R version 4.4.0 (2024-04-24) -- "Puppy Cup".
Candellone, E., Aleta, A., Ferraz de Arruda, H. et al. Characteristics of the vegetable oil debate in social-media and its implications for sustainability. Commun Earth Environ 5, 391 (2024). https://doi.org/10.1038/s43247-024-01545-x
@Article{Candellone2024,
author={Candellone, Elena
and Aleta, Alberto
and Ferraz de Arruda, Henrique
and Meijaard, Erik
and Moreno, Yamir},
title={Characteristics of the vegetable oil debate in social-media and its implications for sustainability},
journal={Communications Earth {\&} Environment},
year={2024},
month={Jul},
day={21},
volume={5},
number={1},
pages={391},
issn={2662-4435},
doi={10.1038/s43247-024-01545-x},
url={https://doi.org/10.1038/s43247-024-01545-x}
}
- Elena Candellone e.candellone@uu.nl