Modern Data Analytics (B-KUL-G0Z39B)

Aims
After having completed this course, the student will have acquired the necessary practical skillset and theoretical knowledge to deal with a wide variety of data science-related tasks. The course will equip the student with a solid set of tools to successfully approach his/her Master’s thesis and/or any other assignment in the programme Master of Statistics or beyond.
Each lecture is set up as an interactive workshop requiring active participation of the student and pre-course notes to be read.
Python will be the main program language used throughout this course.
Previous knowledge
Skills: the student should be able to analyse, synthesise and interpret.
Knowledge
- Experience with at least one programming language
- Fundamental concepts of statistics
Identical courses
This course is identical to the following courses:
G0Z39C : Modern Data Analytics
Is included in these courses of study
- Master in de statistiek (Leuven) 120 ects.
- Master of Statistics and Data Science (on campus) (Leuven) (European Master of Official Statistics (EMOS)) 120 ects.
- Master of Statistics and Data Science (on campus) (Leuven) (Interdisciplinary Statistics and Data Science (No new enrollments for this track as from academic year 2024-2025)) 120 ects.
- Master of Statistics and Data Science (on campus) (Leuven) (Statistics and Data Science for Biometrics) 120 ects.
- Master of Statistics and Data Science (on campus) (Leuven) (Statistics and Data Science for Business) 120 ects.
- Master of Statistics and Data Science (on campus) (Leuven) (Statistics and Data Science for Industry) 120 ects.
- Master of Statistics and Data Science (on campus) (Leuven) (Statistics and Data Science for Social, Behavioral and Educational Sciences) 120 ects.
- Master of Statistics and Data Science (on campus) (Leuven) (Theoretical Statistics and Data Science) 120 ects.
- Master of Mobility and Supply Chain Engineering (Leuven) 120 ects.
Activities
4 ects. Modern Data Analytics (B-KUL-G0Z39a)




Content
Introduction to Python: Jupyter Notebooks, Broadcasting, Indexing, Python Package Manager, Graphs, Functions, Control structures, Graphs
1.Python Advanced DNA
- Object oriented programming in Python
- Managing python environments on a single computer.
- Working as a team and collaborate on Github
- Plotly-graphs
2.Basics of Machine Learning
We briefly discuss some basic concepts of Machine Learning (ML). Through the use of examples, we go through several common applications and pitfalls of ML. This should be used as a starting point for students with a limited background in ML.
Supervised Learning: We start by discussing the idea behind regression analysis and how a regression problem is tackled in practice. Along the way, we emphasize the implications and dangers of each step in the modeling process. We delve deeper into the bias-variance trade-off and discuss some popular methods used to control this trade-off. We finish by discussing classification problems.
Unsupervised Learning: We provide some important examples commonly used for different unsupervised learning tasks. We look at clustering problems, outlier detection, and dimension reduction.
3.Building Machine Learning Pipelines
In this lecture, we learn how to use the models in the Scikit-Learn package. We go through several examples to familiarise ourselves with the API. We showcase some examples of preprocessing such as PCA, standardization and one-hot encoding. Finally, we introduce the notion of transformers and pipelines which are tools that can greatly improve code readability and structure.
4. Building & Deploying Apps
5.Analytics Infrastructure
In this lecture, we kick off the third part of the course which focuses on the building backs of a data-driven IT architecture. We focus on exploring some key elements of data analytics infrastructure and further introduce two commonly used tools for managing machine learning projects in particular.
6.Version Control and Code Repositories
The following lecture covers the basics of Git and GitHub. The goal of the lecture is to familiarise the student with the core concepts of Version Control and repositories and go over some basic examples using Git(Hub) in practice.
7.Cloud Computing
In this section we explain how to use the cloud (AWS) for data-science assignments. The following topics are covered
- storage
- user management
- python - interaction
- computation engines
Course material
Slides will be provided on Toledo
Is also included in other courses
Evaluation
Evaluation: Modern Data Analytics (B-KUL-G2Z39b)
Explanation
Partial or continuous assessment with (final) exam during the examination period.
Continuous evaluation where students will deliver in team 2 projects.
Oral exam where the projects are discussed. Use of Laptop is permitted.
Information about retaking exams
Oral exam, use of Laptop is permitted.