Optimization for Machine Learning

All programmes > Optimization for Machine Learning

Optimization for Machine Learning (B-KUL-H0O09A)

4 ECTS

English

Second term

Cannot be taken as part of an examination contract

Patrinos Panos

POC ir. Artificiële Intelligentie

Aims

The aim of this course is to introduce to students the theory and algorithms for optimization problems that arise in machine learning and data science. In particular, complexity, robustness and scalability of algorithms to large datasets will be discussed in theory and in implementation.

By the end of the course the student will:

be able to formulate machine learning tasks as optimization problems,
be able to tell which optimization formulation is more suitable for the machine learning task at hand, based on complexity, scalability, convexity and smoothness aspects,
have a profound understanding of a wide variety of optimization algorithms and their properties, and will be able to apply the appropriate algorithms for a given machine learning task,
be able to implement optimization algorithms for large-scale machine learning problems.

Is included in these courses of study

Master in de ingenieurswetenschappen: artificiële intelligentie (Leuven) 120 ects.

expand

Activities

3 ects. Optimization for Machine Learning: Lecture (B-KUL-H0O09a)

3 ECTS

EnglishFormat: Lecture

Second term

Patrinos Panos

POC ir. Artificiële Intelligentie

Content

Content

optimization in machine learning and data science, motivating examples ∗ empirical risk minimization,
maximum likelihood estimation,
supervised learning (deep learning, regression/classification), ∗ unsupervised learning (PCA, clustering,. . . ),
reinforcement learning
overparameterization, regularization, generalization,. . .
watershed between convexity and nonconvexity, optimality conditions – backpropagation and automatic differentiation
(stochastic) gradient descent (SGD)
accelerated gradient descent and momentum
variants of SGD (ADAM, AdaGrad,...), applications in deep learning and rein- forcement learning
finite sum minimization, variance reduced algorithms
projected/proximal (sub)gradient descent, applications in sparse estimation,. . .
block coordinate descent and alternating minimization, applications in SVM, non- negative matrix factorization
dual algorithms (dual proximal gradient descent, ADMM), applications in SVM, kernel methods
expectation-maximization (EM), applications in MAP estimation for latent variable models
min-max optimization, applications to GANs and adversarial robustness
second-order algorithms: Newton, Gauss-Newton, quasi-Newton, L-BFGS