Moritz Laurer, Vrije Universiteit Amsterdam

Encoding Political Interpretation – Deep Transfer Learning for Political Text Analyses

More and more politically relevant information is hidden in digital text corpora too large for manual analyses. Political scientists are therefore increasingly adopting computational text analysis methods to analyse these large corpora. One popular method is supervised machine learning, where researchers create training data and teach algorithms to reproduce a classification task on unseen texts. Supervised machine learning, as it is used in Political Science today, has, however, several important shortcomings: First, established algorithms require too much training data, therefore making the approach too expensive for many research projects. Second, most methods and datasets focus on English language research, neglecting the majority of non-English political texts. Third, most supervised machine learning is used to identify simple concepts like topics and sentiment in isolation. In practice, however, political scientists are more interested in measuring complex concepts to better understand political phenomena. This PhD thesis tries to address these issues, by combining recent methodological advances from the Natural Language Processing (NLP) literature with the research interests and methods of Political Science. Recent advances in deep transfer learning and specific approaches like Natural Language Inference (NLI) can help alleviate several limitations of established methods currently applied in political text analyses. This thesis aims to demonstrate empirically that deep transfer learning methods can: substantially reduce the need for training data; enable multilingual analyses; and can meaningfully support the process of understanding political phenomena.