In September 2020, the world watched as the longstanding tensions between Armenia and Azerbaijan erupted into full-blown conflict over the disputed region of Nagorno-Karabakh, also known as Artsakh. While the war raged in the craggy terrains of the Caucasus, another battle was being waged—this one on the digital plains of Twitter. This new form of conflict, often dubbed "astroturfing" or social media narrative warfare, saw individuals and organized groups pushing specific agendas to shape public opinion about the war.
The 2020 Armenia-Azerbaijan war was not isolated in its use of social media as a weapon. Various actors, both pro-Armenian and pro-Azerbaijani, took to Twitter to disseminate propaganda, employ disinformation strategies, and even engage in hate speech. While there has been some academic focus on this phenomenon, the landscape of this digital warfare is ever-evolving, necessitating an updated exploration.
The inspiration for revisiting this complex topic was further kindled by recent events such as the devastating earthquakes in South Eastern Turkey on February 6, 2023. In the aftermath of the tragedy, data science and machine learning technologies were quickly deployed to create disaster maps and coordinate relief efforts. This rapid response showcased the immense potential for employing advanced technologies in humanitarian contexts.
As the capabilities of large language models and machine-learning methods become increasingly sophisticated, there is a timely opportunity to re-examine the Twitter activity surrounding the Nagorno-Karabakh conflict. By leveraging the latest advancements in data science, this project aims to provide an updated, nuanced understanding of social media's role in modern geopolitical conflicts. Not only will this analysis serve academic interests, but it will also offer actionable insights for policy makers, social activists, and communities at large.
This repository explores the use of Large Language Models (LLMs) on Twitter data. The methodology covers several techniques including data collection, data preprocessing, LLM training and inference. Further to this, there are references to model explainability to understand how these LLMs make predictions and the factors that contribute to its output. An interactive data visualisation has also been created to allow you to explore the data and gain a better understanding of the trends and patterns that have been identified.