Data Management (B-KUL-G0Z53A)
Aims
The student can handle scientific quantitative research questions, independently, effectively, creatively, and correctly using state-of-the-art design and analysis methodology and software.
The student is able to efficiently acquire, store and process data.
Previous knowledge
Skills: the student should be able to analyse, synthesise and interpret.
Knowledge:
- Experience with at least one programming language
- Fundamental concepts of statistics
Identical courses
This course is identical to the following courses:
G0Z53B : Data Management
Is included in these courses of study
- Master of Statistics and Data Science (on campus) (Leuven) (Statistics and Data Science for Biometrics) 120 ects.
- Master of Statistics and Data Science (on campus) (Leuven) (Statistics and Data Science for Business) 120 ects.
- Master of Statistics and Data Science (on campus) (Leuven) (Statistics and Data Science for Industry) 120 ects.
- Master of Statistics and Data Science (on campus) (Leuven) (Statistics and Data Science for Social, Behavioral and Educational Sciences) 120 ects.
- Master of Statistics and Data Science (on campus) (Leuven) (Theoretical Statistics and Data Science) 120 ects.
- Courses for Exchange Students Faculty of Science (Leuven)
Activities
5 ects. Data Management (B-KUL-G0Z53a)
Content
In this course, students will be made familiar with relational as well as document- and graph-databases. They will learn how to model data in each, and how to load/process/extract data.
RDBMS
We will begin by showing how databases can be designed on the conceptual level, using the Entity-Relationship model. We will then see how a conceptual design can be automatically converted in a logical design, using the relational data model. We will look at Boyce-Codd Normal Form as a precise way to distinguish poorly designed relational database schemas from well-designed ones. Last but not least, we will learn how relational databases can be queried, using relational algebra and the standard query language SQL.
Document databases
Although relational databases have historically been the most used system, recent years have seen a surge in the adoption of so-called NoSQL databases. One prominent group of NoSQL databases are document stores. Whereas data in RDBMS is typically normalized across different tables, the information about a single object in a document database is stored together which has several advantages. For example, (a) properties of an object don't depend on columns that are defined at the level of the table but at the object-level itself, (b) as objects are self-contained we do not need expensive join-operations to combine relevant information, and (c) scalability of the database is easier.
Graph DataBases
For several decades, developers have tried to accommodate connected, semi-structured datasets inside relational databases. But whereas relational databases were initially designed to codify paper forms and tabular structures something they do exceedingly well they struggle when attempting to model the ad hoc, exceptional relationships that crop up in the real world. This is where graphs move in.
The application of graph databases is wide spread: going from fraud-detection, to social network analysis, pandemic-modelling and even covid-19 contact-tracing
The course will result in a hands-on experience where students will be introduced to the construction of graphs and the application of typical graph algorithms: centrality, in-betweenness, etc...
Technologies used
A large number of technologies exist for each of these database types. For the RDBMS part, we will use PostgreSQL. In order to avoid having to learn different languages for each of the database types, we will use ArangoDB as document-oriented and graph-database instead of mongodb and neo4j, respectively.
Python will be used for data processing and plotting.
Course material
Slides will be provided on Toledo