Collecting and Analyzing Big Data for Social Sciences (B-KUL-S0K17A)

Aims
This course is an introduction to collecting and analyzing “big data” for social scientists.
By the end of the course, it is expected that students will be able to
- Collect data from the internet using web scraping and APIs;
- Read and write digital text files;
- Analyze data using supervised learning techniques;
- Analyze data using unsupervised learning techniques;
- Understand and apply current methods for analyzing textual data;
- Link machine learning methods to relevant social science questions;
- Critically assess the use of big dat for social sciences;
- Shed light on issues regarding ethics and privacy in the use of big data for social sciences;
- Program in R.
Previous knowledge
Students should have a basic knowledge or R for data management. For students with no prior experience with R, a self-study package will be made available at the start of the course. Students should have basic knowledge of exploratory univariate and bivariate statistics and be acquainted with standard regression techniques.
Is included in these courses of study
- Master in het publiek management en beleid (Leuven) 60 ects.
- Master of Statistics and Data Science (on campus) (Leuven) (European Master of Official Statistics (EMOS)) 120 ects.
- Master of Statistics and Data Science (on campus) (Leuven) (Statistics and Data Science for Biometrics) 120 ects.
- Master of Statistics and Data Science (on campus) (Leuven) (Statistics and Data Science for Business) 120 ects.
- Master of Statistics and Data Science (on campus) (Leuven) (Statistics and Data Science for Industry) 120 ects.
- Master of Statistics and Data Science (on campus) (Leuven) (Statistics and Data Science for Social, Behavioral and Educational Sciences) 120 ects.
- Master of Statistics and Data Science (on campus) (Leuven) (Theoretical Statistics and Data Science) 120 ects.
- Master of Statistics and Data Science (Abridged Programme - Quantitative Analysis in the Social Sciences) (No new enrollments as from 2023-2024) (Leuven) (Quantitative Analysis in the Social Sciences) 60 ects.
- Master of International Politics (Leuven) 60 ects.
- Master of Sociology (Leuven) (Quantitative Analysis and Social Data Science (QASS)) 60 ects.
- Master of Statistics and Data Science (blended) (Leuven) (Statistics and Data Science for Biometrics) 120 ects.
- Master of Statistics and Data Science (blended) (Leuven) (Statistics and Data Science for Business) 120 ects.
- Master of Statistics and Data Science (blended) (Leuven) (Statistics and Data Science for Industry) 120 ects.
- Master of Statistics and Data Science (blended) (Leuven) (Statistics and Data Science for Social, Behavioral and Educational Sciences) 120 ects.
- Master of Statistics and Data Science (blended) (Leuven) (Theoretical Statistics and Data Science) 120 ects.
- Master in de vergelijkende en internationale politiek (programma voor studenten gestart in 2024-2025 of later) (Leuven) 60 ects.
Activities
4 ects. Collecting and Analyzing Big Data for Social Sciences (B-KUL-S0K17a)




Content
This course is an introduction to collecting and analyzing big data, specifically addressed to social scientists. Major topics include:
- Collecting data from digital files, web application programming interfaces, and web scraping;
- Best practices in data manipulations;
- Introduction to supervised and unsupervised learning;
- Text as data, including text categorization and topic models;
- Introductory and intermediate programming in R.
The material will be illustrated with examples from social science research.
Course material
Slides, computational notebooks and readings.
Format: more information
The course will consist of in class lectures with hands-on exercises, supported by online modules that need to be processed individually.
This course is taught in block teaching. The contact sessions are concentrated within a few weeks, with multiple lectures each week.
Evaluation
Evaluation: Collecting and Analyzing Big Data for Social Sciences (B-KUL-S2K17a)
Explanation
Characteristics of the evaluation
The evaluation consists of two parts:
1) Students are expected to write a paper and construct a computational notebook that employs multiple course skills on a case applicable in the social sciences. Assignment specific details will be communicated in the lecture and via Toledo. This research notebook is a group work assignment.
2) Oral exam during examination period.
Determination of the final mark
The course is evaluated by the lecturer(s), as communicated via Toledo and the exam schedule. The result is calculated and communicated as an integral number on 20.
The agreed deadline is respected when submitting the Research Notebook assignments. Negotiation about any deviation is impossible. When any special circumstances have occurred, contacting the ombuds person of the faculty before the deadline is required. With disrespect of the agreed deadline, the course will be calculated as a 0-score within the weighted end result, unless a new submission deadline is determined after serious circumstances.
Students are fully responsible for submitting papers and assignments free of fraud and plagiarism (www.kuleuven.be/english/education/plagiarism/) and are requested to observe the Faculty’s relevant regulations. Plagiarism will be sanctioned with the sanctions mentioned in the University’s Regulations.
The use of generative artificial intelligence is allowed, but only in accordance with the guidelines of KU Leuven.
Second exam opportunity
Same as evaluation for first exam period: Research Notebook and oral exam