Data Mining (B-KUL-H02C6A)
Aims
Today it is possible to collect and store vast quantities of data. These data often contain value information and insights. However, it may take human analysists weeks or months to discover the information if they are able to do it at all. Furthermore, so much data exist that most of it is never even analyzed. The goal of data mining is to fill this void by automatically identify models and patterns from these databases that are (1) valid, that is, they hold on new data with some certainty, (2) novel, that is, they are non-obvious, (3) useful, that is, they are actionable, and (4) understandable. that is humans can interpret them. In order to do this, data mining, also called knowledge discovery in databases (KDD), combines ideas from the fields of machine learning, databases, statistics, visualization, and many other fields.
The goal of this course is to provide a broad survey of several important and well-know fields of data mining and to develop an overall sense of how to extract information from data in a systematic way. It tries to give inisght into the challenges faced by data miners and the inner workers of specific data mining algorithms as well as provide some understanding about why data mining is important and interesting. The course consists of lectures, readings and exercises sessions. The exercise sessions reinforce the central concepts covered during class and give students some experience working with publicly available data mining tools. The course requires knowledge of machine learning.
Previous knowledge
Bachelor or Master level with at least basic knowledge of computers, algorithms and data structures. Moreover, the students should be comfortable with mathematical concepts such as differentiation, probability and statistics.
Knowledge of Machine Learning techniques. Specifically, the student must have followed either the (1) "Machine Learning and Inductive Inference" (B-KUL-H02C1A) class or (2) Beginselen van machine learning (B-KUL-H0E96A) / Principles of Machine Learning (B-KUL-H0E98A) class. Or they must have followed a course that was deemed to be equivalent.
Order of Enrolment
This course unit is a prerequisite for taking the following course units:
H00Y4A : Big Data Analytics Programming
Is included in these courses of study
- Master in de toegepaste informatica (programma voor studenten gestart vóór 2024-2025) (Leuven) (Artificiële intelligentie) 60 ects.
- Master of Artificial Intelligence (Leuven) (Specialisation: Big Data Analytics (BDA)) 60 ects.
- Master of Bioinformatics (Leuven) (Bioscience Engineering) 120 ects.
- Master of Bioinformatics (Leuven) (Engineering) 120 ects.
- Master in de ingenieurswetenschappen: computerwetenschappen (Leuven) (Hoofdoptie Artificiële intelligentie) 120 ects.
- Master of Biomedical Engineering (Programme for students started before 2021-2022) (Leuven) 120 ects.
- Courses for Exchange Students Faculty of Engineering Science (Leuven)
- Master of Engineering: Computer Science (Leuven) (Option Artificial Intelligence) 120 ects.
- Master of Actuarial and Financial Engineering (Leuven) 120 ects.
Activities
3.2 ects. Data Mining: Lecture (B-KUL-H02C6a)
Content
Topics covered include (not necessarily in this order):
1) Data mining overview
2) The data mining process
3) Recommender systems
4) Association rule mining
5) Sequential pattern mining
6) Clustering
7) Large scale decision tree learning
8) Advanced topics on ensemble methods
9) Using unlabeled data
10) Data streams
11) Advanced topics (time permitting)
0.8 ects. Data Mining: Practical Sessions (B-KUL-H00I0a)
Content
The exercise sessions reinforce the central concepts covered during class and give students some experience working with publicly available data mining tools. More specifically, tasks many include:
1) Working through the control of an algorithm to better understand how it functions
2) Implementing a small part of an algorithm
3) Working through a small part of the data mining process
4) Using Weka to analyse data
5) Theoretical questions designed to extend a student's knowledge of the subject
6) Discussing and solving a data mining problem with a small group and presenting the conclusions of the discussion to the whole exercise session
Evaluation
Evaluation: Data Mining (B-KUL-H22C6a)
Explanation
Closed book written exam about the topics covered in the lectures, exercise sessions and reading. The goal will be to assess two questions:
1) Do you understand the important basic concepts covered in class
2) Do you have an advanced understanding of the topics covered
Some questions will be similar in spirit to those solved in the exercise sessions while others will ask a student to apply a learned concept in a different context. Be sure to read all the questions carefully and to think about how the answer to each question is structured.