Data Management - KU Leuven

All programmes > Data Management

Data Management (B-KUL-G0Z53A)

5 ECTS

English

First term

Cannot be taken as part of an examination contract

De Spiegeleer Jan

POC Master in statistiek

Aims

The student can handle scientific quantitative research questions, independently, effectively, creatively, and correctly using state-of-the-art design and analysis methodology and software.

The student is able to efficiently acquire, store and process data.

Previous knowledge

Skills: the student should be able to analyse, synthesise and interpret.

Knowledge:

Experience with at least one programming language
Fundamental concepts of statistics

Identical courses

This course is identical to the following courses:
G0Z53B : Data Management

Is included in these courses of study

Master of Statistics and Data Science (on campus) (Leuven) (Statistics and Data Science for Biometrics) 120 ects.
Master of Statistics and Data Science (on campus) (Leuven) (Statistics and Data Science for Business) 120 ects.
Master of Statistics and Data Science (on campus) (Leuven) (Statistics and Data Science for Industry) 120 ects.
Master of Statistics and Data Science (on campus) (Leuven) (Statistics and Data Science for Social, Behavioral and Educational Sciences) 120 ects.
Master of Statistics and Data Science (on campus) (Leuven) (Theoretical Statistics and Data Science) 120 ects.
Courses for Exchange Students Faculty of Science (Leuven)

expand

Activities

5 ects. Data Management (B-KUL-G0Z53a)

5 ECTS

EnglishFormat: Lecture

First term

De Spiegeleer Jan

POC Master in statistiek

Content

In this course, students will be made familiar with relational as well as document- and graph-databases. They will learn how to model data in each, and how to load/process/extract data.

RDBMS

We will begin by showing how databases can be designed on the conceptual level, using the Entity-Relationship model. We will then see how a conceptual design can be automatically converted in a logical design, using the relational data model. We will look at Boyce-Codd Normal Form as a precise way to distinguish poorly designed relational database schemas from well-designed ones. Last but not least, we will learn how relational databases can be queried, using relational algebra and the standard query language SQL.

Document databases

Although relational databases have historically been the most used system, recent years have seen a surge in the adoption of so-called NoSQL databases. One prominent group of NoSQL databases are document stores. Whereas data in RDBMS is typically normalized across different tables, the information about a single object in a document database is stored together which has several advantages. For example, (a) properties of an object don't depend on columns that are defined at the level of the table but at the object-level itself, (b) as objects are self-contained we do not need expensive join-operations to combine relevant information, and (c) scalability of the database is easier.

Graph DataBases

For several decades, developers have tried to accommodate connected, semi-structured datasets inside relational databases. But whereas relational databases were initially designed to codify paper forms and tabular structures something they do exceedingly well they struggle when attempting to model the ad hoc, exceptional relationships that crop up in the real world. This is where graphs move in.

The application of graph databases is wide spread: going from fraud-detection, to social network analysis, pandemic-modelling and even covid-19 contact-tracing

The course will result in a hands-on experience where students will be introduced to the construction of graphs and the application of typical graph algorithms: centrality, in-betweenness, etc...

Technologies used

A large number of technologies exist for each of these database types. For the RDBMS part, we will use PostgreSQL. In order to avoid having to learn different languages for each of the database types, we will use ArangoDB as document-oriented and graph-database instead of mongodb and neo4j, respectively.

Python will be used for data processing and plotting.

Course material

Slides will be provided on Toledo

Is also included in other courses

G0Z53B : Data Management

Evaluation

Evaluation: Data Management (B-KUL-G2Z53a)

Type : Exam during the examination period

Description of evaluation : Written

Learning material : Course material

Share this page

Translations