Reinforcement Learning (B-KUL-H0O23A)

4 ECTSEnglish33 Second termCannot be taken as part of an examination contract
POC ir. Artificiële Intelligentie

This course will familiarise the students with the domains of planning and reinforcement learning, which is concerned with sequential decision making and learning in intelligent agents.

 

After following this course, students will

- have a deep understanding of Markov Decision Processes and their role in planning and reinforcement learning,

- understand different settings studied in AI, especially in sequential decision making and reinforcement learning,

including full vs partial observability, online vs offline, model-based vs model-free, single vs multi-agent,  and Markovian vs non-Markovian.

- have an overview of the existing techniques and algorithms for planning and reinforcement learning under different conditions,

- understand how these techniques work, why they work, and when they work,

- be able to incorporate these techniques into intelligent agents, AI systems, and their applications,

- be up-to-date with the current state of the art and be able to familiarize himself with new research results in the area.

Knowledge of Machine Learning, and Neural Networks

Mixed prerequisite:
You may only take this course if you comply with the prerequisites. Prerequisites can be strict or flexible, or can imply simultaneity. A degree level can be also be a prerequisite.
Explanation:
STRICT: You may only take this course if you have passed or applied tolerance for the courses for which this condition is set.
FLEXIBLE: You may only take this course if you have previously taken the courses for which this condition is set.
SIMULTANEOUS: You may only take this course if you also take the courses for which this condition is set (or have taken them previously).
DEGREE: You may only take this course if you have obtained this degree level.


SIMULTANEOUS(H02C1A) OR SIMULTANEOUS(H0E96A)

The codes of the course units mentioned above correspond to the following course descriptions:
H02C1A : Machine Learning and Inductive Inference
H0E96A : Beginselen van machine learning

Activities

3 ects. Reinforcement Learning: Lecture (B-KUL-H0O23a)

3 ECTSEnglishFormat: Lecture18 Second term
POC ir. Artificiële Intelligentie

Introduction to planning and reinforcement learning

 

Multi-armed bandits and their algorithms

            -- exploration vs exploitation

            -- rewards and regret

            -- greedy algorithms

            -- upper confidence bounds

 

Markov Decision Processes and their variants

            -- Bellman Equations

            -- Policies and value functions

            -- Optimality

            -- Partial and full observability

 

Dynamic Programming

            -- Policy evaluation, improvement and iteration

            -- Value iteration

 

Monte Carlo Methods

 

Temporal-difference learning

            - TD Prediction

            - Q-learning

            - Sarsa

            - On-policy vs off-policy

            - n-Step bootstrapping

 

Planning and learning with tabular methods

            -- Dyna : integrated planning, acting and learning

            -- Real time dynamic programming

            -- Monte-Carlo tree search

 

 

Approximate methods

            -- Value function approximation

            -- Gradient methods

            -- on-policy and off-policy variants

 

Policy gradient methods           

            -- Policy approximation

            -- Policy gradients

            -- Actor Critic

 

Contemporary topics

            - Deep Reinforcement learning

            - multi-agent reinforcement learning

            - shielding and safe reinforcement learning

            - relational reinforcement learning and traditional planning

 

 

Applications in game playing and beyond

Sutton and Barto, Reinforcement learning: an Introduction, 2nd Edition.

Additional materials on Toledo.

1 ects. Reinforcement Learning: Exercises (B-KUL-H0O24a)

1 ECTSEnglishFormat: Practical15 Second term
POC ir. Artificiële Intelligentie

6 sessions of 2.5 hours and some assignments

 

The exercise sessions practice the concepts, models and techniques seen in the lectures.

The exercise material will be made available on Toledo

Evaluation

Evaluation: Reinforcement Learning (B-KUL-H2O23a)

Type : Partial or continuous assessment with (final) exam during the examination period
Description of evaluation : Written, Paper/Project
Type of questions : Multiple choice, Open questions, Closed questions
Learning material : None


The evaluation consists of a written exam in the exam period and permanent evaluation during the semester:

 

The closed book exam consists of a theoretical part and an exercise part

The permanent evaluation part involves applying the material seen in the lectures and exercises in a new context (practical)

The result of the permanent evaluation is carried over to the third examination period, but not to a following academic year.