We are delighted to welcome you to the 2024 COMPTEXT meeting at the Vrije Universiteit (VU), Amsterdam. The 2024 meeting is being held at the VU's NU building from May 2-4, 2024.
Additional information is available on the COMPTEXT Organization Website
The COMPTEXT 2024 Organising Committee consists of:
Mariken A.C.G. van der Velden (Vrije Universiteit Amsterdam)
Roan Buma (Vrije Universiteit Amsterdam)
Alona O. Dolinsky (Vrije Universiteit Amsterdam)
Johannes B. Gruber (Vrije Universiteit Amsterdam)
Kasper Welbers (Vrije Universiteit Amsterdam)
Miklós Sebők (Centre for Social Sciences, Budapest)
If you enjoyed the keynote speech about running Generative Large Language Models locally using Ollama, and want to revisit the slides or share them with your collegues, you can download them here.
In this introductory workshop, participants will learn the basics of utilizing the programming language R for analyzing textual data. R is a powerful open-source tool for text analysis due to its extensive libraries and packages tailored for statistical computing and natural language processing. Attendees will be introduced to fundamental concepts such as data importation, text preprocessing, sentiment analysis, and machine learning. Through hands-on exercises and demonstrations, participants will gain practical skills to manipulate and analyze text data effectively, empowering them to extract valuable insights from textual sources using R. While having some general basics in R is helpful, it is not necessarily required, as the workshop is designed to be accessible to beginners. Participants are welcome to bring their own data and are encouraged to follow along using their laptops; it is recommended to install both R and R Studio on their computers beforehand to facilitate active participation.
This workshop will provide an accessible introduction to visualisation techniques in R focusing on the ggplot2 package and relying on the broader tidyverse structure. The materials will cover a wide range of approaches including distributions, frequencies, proportions, associations between variables (+ interaction variables), time series and more complex vizualisations as time permits. Participants should have basic knowledge of R and are encouraged to bring their own data to the workshop to use while following along the materials.
A cornerstone of computational social science has always been to work with data that was not specifically designed for data analysis, but left behind as traces of human actions and societal processes. This workshop provides a practical overview of techniques to gather web data by extracting it from the web and reshaping it into a usable form -- a process usually referred to as web scraping. Web scraping has become significantly more important, as large swaths of the formally open web are now obstructed by technological means. This workshop provides an overview of simple to advanced techniques that can be used to collect essentially all content from the web that is accessible by a human.
Validation is fundamental to scientific inquiry, especially when dealing with extensive unseen data. As researchers, ensuring the trustworthiness of the models used to present results and formulate recommendations is imperative. In this workshop, we will look at different methods that can be used to validate models, their advantages and potential drawbacks, and discuss what it means for a model to be valid.
Today, it is easier than ever to run your code in the cloud, and learning how to do so opens many doors as a computational social scientist. For instance, you can perform heavy and long running computations on a server, automate web scrapers to run daily, create your own API, or host a web application for conducting online experiments. In this workshop we will look at various options, including some platforms with a generous free tier, that you can get started with right away.
Language Models have become an essential tool in social science research for analyzing vast amounts of data. However, their training requires significant computing resources, which can present monetary and environmental challenges. In this workshop, we will discuss the potential of using small-scale, parameter-efficient models for social science research. We will demonstrate how to use the Adapter framework for parameter-efficient models using Python and show a practical hands-on example with topic classification.
MEXCA is a full pipeline for extracting facial expressions, vocal characteristics and text sentiment from debates or conversations between multiple people. The pipeline distinguishes who talks and who is shown in the frame. This way it is well-suited to analyze raw video materials of debates and conversations between politicians or citizens. In the workshop we present the scientific goals of MEXCA and we will do a hands-on introduction of the software.