Home>Introduction to Computational Text Analysis and Social Media Research using R

11.05.2021

Introduction to Computational Text Analysis and Social Media Research using R

Free of charge - open to all masters and doctoral students. No prior knowledge of R needed

Justin Chun-ting HO, Postdoctoral Researcher at Sciences Po 

What is Computational Text Analysis ? 

>Computational text analysis (also called Quantitative Text Analysis, Automated Content Analysis, Text Mining, Text as Data etc.) draws on techniques developed in natural language processing and machine learning to analyse textual documents. While there are a wide range of methods, computational text analysis typically follows this workflow: raw texts are processed and converted into a quantitative form, which is then analysed using the statistical tools. Computational text analysis therefore transforms text into the quantitative data, rendering it possible to employ well-established statistical and machine learning tools for inference and prediction. 

Course description 

As a result of the advert of the internet and advancement in information technology, massive volume of text on a wide variety of topics has become available. These include not only contents on social media and websites, but also digitized content of government documents, newspapers, books, and other historical sources. At the same time, computational text analysis methods are increasingly being used to conduct social and political research.

The course is perfect for researchers who are interested in utilising textual data but lack the technical knowledge. In this course, you will develop the foundation skills needed for collecting and analysing textual data from digital sources. The course begins with an introduction to fundamental R programming for absolute beginners. After acquiring the necessary skills in R, we will then moves on to harvesting data from online sources. The course will conclude will an introduction to analysing textual data using computational techniques. These techniques can be used to analyse texts from a wide range of digital sources, including website content, digitised books and documents, party manifestos, political speeches, and so on. By the end of the course, students will have the skills and resources to apply computational methods to address scholarly problems in social and political sciences. 

Prerequisites 

You need no prior knowledge of R. While it would be beneficial to have some experience in coding (eg STATA), those without any prior experience will still be able to participate effectively. The course will be taught using R. You are required to download and install both R and RStudio before the workshop.

Students who are proficient in R could consider skipping Day 1 and Day 2. 

Course objectives 

The course aims to equip students with the key technical skills to conducting social media research with a focus on analysing textual data. By the end of the course, it is expected that the students will be able to : 

  • Write basic programme in R 
  • Harvest social media data using tools that employ APIs. 
  • Read and write digital text files. 
  • Analyze textual data using computational text analysis techniques. 

Course structure

The course is divided in three blocks. The first block (Day 1 and Day 2) focuses on the fundamental skills of R programming, such as importing data into R, data wrangling, calculating summary statistics, and creating publication-quality graphics. The second block (Day 3) covers various ways to collect data from social media and other digital sources, with a major focus on Facebook and Twitter. The third block (Day 4 and Day 5) turns to Computational Text Analysis techniques. By the end of the course, students will have the skills and resources to apply these methods to answer scholarly problems in social and political sciences.

The course is designed with a hands-on approach. Each class will consist of a lecture component where key concepts will be introduced, and the main component will be live demonstration and practice exercises. The exercises are designed in a way that is accessible to people with no programming experience. The course also has a practical emphasis, each class will be focusing on applying the specific techniques to solve realistic tasks in social research.

14 & 15 June : 2 p.m - 5 p.m / 16, 17 & 18 June 2021 : 2 p.m - 3.30 p.m

  • Day 1 (3 hours): We will cover basic information about R syntax, the RStudio interface, importing dataset, and the structure of data frames.
  • Day 2 (3 hours): We will move on to data wrangling, calculating summary statistics, and a brief introduction to plotting.
  • Day 3 (1.5 hours): We will cover the process of obtaining social media data online. We will discuss the current landscape of social media data collection and the major tools available to academic researchers. Practical exercise will focus on obtaining Facebook and Twitter data.
  • Day 4 (1.5 hours): We will cover the basic assumptions behind computational text analysis, the pre-processing steps needed prior to the analysis, and various analytical techniques. We will also offer a brief overview of widely used methods in computational text analysis.
  • Day 5 (1.5 hours): We will cover practical text analysis techniques such as keyness analysis, a useful technique to compare two corpora, and sentiment analysis.

If you have any questions about the course or pre-requisites, please contact the instructor justin.chunting.ho@sciencespo.fr

=> Enrolment contact : katia.dumoulin@sciencespo.fr