Infectious interactions: Identifying user-communities in German-language Twitter topic networks on vaccination throughout the Covid-19 pandemic

by David Leimstädtner


Thesis supervisor: dr. Damian Trilling

The Covid-19 pandemic presents a situation of crisis at an unprecedented global scale, in which a general public is perforce confronted with an online debate, which has long been targeted by a highly active minority of anti-vaxxers, fueling the spread of misinformation on social networking sites as Twitter. Addressing these recent developments, my thesis asks: How does this influx of new users joining the vaccination debate change structures of information flow within pro and anti-vaccination users clusters ? Is the overall debate polarized based on it’s users’ vaccination stance? 

In order to answer this, the study created network models based on a Twitter dataset and applied community detection algorithms to identify clusters of user interaction within the German-language vaccination debate and examine the evolution of such clusters by comparing three time periods: The year leading up to the pandemic, the initial onset of the outbreak and the timeframe of active roll-out of the German vaccination campaign.



The study’s approach is separated into four phases: First, web mining approaches are employed to gather a dataset of 3.5 million Tweets using Twitter’s newly introduced Full Archive Search API 2.0. Second, the Python network-modelling package iGraph is used to create network representations of the user-interaction patterns  within each period of interest and the Leiden community detection algorithm is applied to identify denser user-clusters within the overall network. Third, a manual content analysis of a subset of 6000 Tweets is conducted, in order to train a machine-learning model to classify the vaccine attitudes based on the tweets contents. The resulting classifier is applied to  each user node in the resulting networks, allowing an analysis of the prevalence of vaccine-critical stances within the identified communities. Lastly, automated content analysis is applied to the tweet texts and profile descriptions pertaining to a particular community, in order to extract the most characteristic terms. All the code created for this study can be found on my Github.



While the overall number of users partaking in the vaccination debate on Twitter increased rapidly, the overall network also became clearly fragmented in the wake of the pandemic, suggesting a polarization of the vaccination debate. In this polarized debate, an anti-vaccination minority distributes misinformation and conspiracy content among it’s decentralized community, while a majority of vaccination-proponents is organized in more hierarchical fashion around central, public figures and traditional news outlets. The anti vaccination cluster is thereby growing at a quicker rate and overall more closely connected, pointing to a more effective spread of information within. The content analysis further confirms prior research regarding the anti-vax movement outside the German context, finding it’s content to concentrate on vaccine side-effects, conspiracy theories and alternative news bloggers.




















Results of the community detection algorithm (left) and the vaccine-stance classifier (right) on a retweet-network concerning vaccination spanning between 28.12.2020 and 15.03.2021

Blue marks the majority community, while red shows the minority community identified by the „Leiden“ community detection algorithm. In the second illustration, green marks users whose posts have been identified as expressing anti-vaccination attitudes by the ML classifier.