Computer Vision and Natural Language Processing

All programmes > Computer Vision and Natural Language Processing

Computer Vision and Natural Language Processing (B-KUL-H0Q35A)

6 ECTS

English

Second term

Cannot be taken as part of an examination contract

Vandewalle Patrick (coordinator) | Goedemé Toon | Vandewalle Patrick | N.

OC Postgraduate Certificate: Advanced Programme on Artificial Intelligence in Business and Industry

Aims

Learning outcomes

MMK1 Scientific-disciplinary knowledge and comprehension in the field of Artificial Intelligence

MMK2 Gaining in-depth knowledge and comprehension, including the ability to develop relevant prototypes or proof-of-concepts, in at least one of the following disciplines in Artificial Intelligence: machine learning, deep learning, knowledge representation, computer vision, audio processing, natural language processing, search and optimisation. Gaining in-depth knowledge and comprehension of at least one of the following application domains of Artificial Intelligence: health, education, logistics, manufacturing, robotics.

MMI1 Problem analysis and solving

1. Students adopt a systematic or innovative approach, using their analytical skills, when solving complex practical engineering problems in the domain of Artificial Intelligence

MMI2 Design and/or development

2. Students can design, implement and test an Artificial Intelligence product or service, keeping the concrete context in mind.

MMI3 Application-oriented research

3. Students demonstrate the ability to act with a creative, precise, inquisitive and critical research attitude by formulating an application-oriented research problem in the domain of Artificial Intelligence, and by choosing and applying the appropriate methodologies to carry out the necessary experiments, including the critical interpretation of the results.

MMG3 Critical thinking

3. Students reflect critically on their choices, actions and obtained results, and can justify the choices made.

Objectives

The course introduces natural language processing technologies and their applications in a variety of tasks, which include text mining, machine translation, question answering and dialogue modeling. It also introduces computer vision algorithms and their applications such as image classification, object detection, image segmentation. Special attention goes to applications that require the joint processing of language and visual data as this is a natural way to interact with machines.

The students will gain insights in suitable machine learning algorithms that ideally are trained with limited annotated examples or human feedback. They will learn how to build and critically assess an application making use of the most recent techniques and resources.

Previous knowledge

Introductory course in machine learning and deep learning.

The following Python programming expertise is required. The links below point to DataCamp courses that cover the necessary Python skills:

Introduction to Python: Lists, functions, methods, numpy arrays
Intermediate Python: Matplotlib, dictionaries, pandas DataFrames, control flow, filtering, loops
Data Science Toolbox (Part 1): writing your own functions, lambda functions
Python Data Science Toolbox (Part 2): iterators, zip, enumerate, generators, list and dict comprehensions

(Free access to DataCamp for self-training can be acquired. Please contact the teaching team.)

Is included in these courses of study

expand

Activities

2 ects. Computer Vision (B-KUL-H0Q35a)

2 ECTS

EnglishFormat: Lecture

Second term

Goedemé Toon | Vandewalle Patrick

OC Postgraduate Certificate: Advanced Programme on Artificial Intelligence in Business and Industry

Content

Introduction: We will give an overview of the imaging pipeline and introduce common image processing operations such as filtering and histogram operations.
We will then cover 2D and 3D computer vision algorithms starting from the most common applications in computer vision. We will mostly focus on the use of deep learning algorithms to address these problems, using techniques such as convolutional neural networks.
- Image enhancement: In image enhancement we aim to create visually more pleasing images using techniques such as contrast enhancement, denoising, high dynamic range imaging and super-resolution
- Image classification, where we will first discuss learning using manual feature engineering and color spaces, followed by convolutional neural networks. We will discuss the most frequently used architectures (e.g. AlexNet, ResNet, VGG, vision transformers) as well as some more advanced techniques such as transfer learning and metric learning. We also cover zero-shot and few-shot image classification using multimodal models (e.g. CLIP).
- Object detection: starting from template matching, we will present local feature matching for object detection in images. We will then move on to discuss deep learning-based methods for object detection such as R-CNN and YOLO (and their different versions) and zero-shot, few-shot and open vocabulary object detection (OWL-Vit, grounded DINO).
- Image segmentation: in image segmentation, we aim to assign labels to the pixels of an image. Thresholding is the most basic version of image segmentation. After some classical techniques such as clustering, mean shift and connected component analysis, we will discuss the use of fully convolutional neural networks and U-Net architectures for segmentation. We will also discuss auto-encoders, GANs, prompt-based image generation (Dall-E, Stable Diffusion, …) and zero-shot image segmentation (SAM).
- 3D processing: We will start with an overview of 3D cameras and representation methods. Next, we will cover depth estimation from stereoscopic or monoscopic images using both classic and deep learning techniques. We also discuss explicit and implicit 3D reconstruction methods such as Structure-from-Motion and Neural radiance fields.
- 3D interpretation: We present deep learning approaches to 3D interpretation tasks such as 3D shape recognition, 3D object detection, 3D segmentation and 3D semantic scene completion.

Course material

Lecture slides.

Language of instruction: more information

Open to non-Dutch-speaking students.

2 ects. Computer Vision and Natural Language Processing: Project (B-KUL-H0Q36a)

2 ECTS

EnglishFormat: Assignment

Second term

Goedemé Toon | Vandewalle Patrick | N.

OC Postgraduate Certificate: Advanced Programme on Artificial Intelligence in Business and Industry

Content

The students will perform a project in the area of computer vision, natural language processing or a combination of computer vision and natural language processing. Students must form groups of 3 to 4 students. Each group proposes their own topic concerning a real-world (industrial) application of computer vision and/or natural language processing. The project will involve design, implementation and evaluation of an AI application processing visual and/or language data to perform a particular task. The aim of this project is to give the students hands-on experience and insights in advanced aspects of the technologies, while dealing with realistic data. At the end of the semester, the students present and demonstrate their work.

Course material

Study material from the theory lectures.

Language of instruction: more information

Open to non-Dutch-speaking students.

Format: more information

The students will perform a project in the area of computer vision and/or natural language processing.

2 ects. Natural Language Processing (B-KUL-H0Q37a)

2 ECTS

EnglishFormat: Lecture

Second term

OC Postgraduate Certificate: Advanced Programme on Artificial Intelligence in Business and Industry

Content

Overview of machine learning models for regression, classification and generation commonly used in natural language processing We survey machine learning models that require feature engineering such as multinomial logistic regression and conditional random fields, and deep learning models which skip the feature engineering step among which are recurrent neural networks, convolutional neural networks, and transformer architectures.
Word embeddings, language models and machine translation Starting from static word embeddings, we discuss contextual word embeddings and their use in language models. Attention mechanisms are illustrated with encoder-decoder architectures commonly used in neural machine translation. From there, we move on to an in-depth analysis of the transformer architecture for neural machine translation and discuss large transformer-based language models such as BERT and GPT.
Text classification and opinion mining Text classification is a frequent task in business settings. Applications, among others, regard email filtering, and automated coding of patents and medical documents. We pay attention to the mining of opinions and arguments in user generated content such as product reviews, blogs or tweets.
Named entity recognition and relation extraction Extraction of specific content such as named entities and their relations allows structuring information found in unstructured natural language texts and contributes to their subsequent analytics and knowledge acquisition.
Cross-modal modeling and semantic search We discuss how NLP tools can be used for semantic search, by modeling the similarity relationship. We start with text and expand across modalities such as vision and language enables to match images and their corresponding textual descriptions. We discuss several attention-based neural architectures which learn to find the latent alignment between regions in an image and words in a sentence.
Question answering and dialogue modeling Question answering technologies contribute to customer support and are a steppingstone towards dialogue models. We discuss several neural architectures for dialogue modeling including ChatGPT. Furthermore, we investigate how a visual context can be integrated in dialogue models.

Course material

Lecture slides.

Language of instruction: more information

Open to non-Dutch-speaking students.

Evaluation

Evaluation: Computer Vision and Natural Language Processing (B-KUL-H2Q35a)

Type : Partial or continuous assessment with (final) exam during the examination period

Description of evaluation : Written, Paper/Project, Presentation

Type of questions : Open questions

Learning material : Course material, Calculator

Explanation

Project: Each group gives an oral presentation of the project at the end of the semester, which is evaluated together with the submitted slides and code.

The theory of both Computer Vision as well as Natural Language Processing are both evanuated during a written open book exam, together for both parts. This exam is organised in a computer room with access to the course materials in Toledo. Students can also bring their written notes, or digital notes by means of a USB memory stick.

Grade distribution:
1/3 project
2/3 theory exam

Information about retaking exams

Written open book exam.

If a student fails for the project part, a new presentation moment will be agreed upon to evaluate the continued project work. The supervisor will communicate clearly what the goals are to be met at the nex evaluation moment.

Share this page

Translations