Advanced Natural Language Processing (B-KUL-H02B1B)

6 ECTSEnglish37 First termCannot be taken as part of an examination contract
POC ir. Artificiële Intelligentie

The course focuses on an in-depth understanding of methods and algorithms for building computer software that understands, generates and manipulates human language. We study the algorithms and models while introducing core tasks in natural language processing (NLP), including language modeling, syntactic analysis, semantic interpretation, machine translation, coreference resolution, discourse analysis, machine reading, question answering and dialogue modeling. We illustrate the methods and technologies with current applications in real world settings. After following this course, the student has acquired an in-depth understanding of contemporary machine learning models designed for processing human language and of the underlying computational properties of NLP models. The student will have learned how underlying linguistic phenomena, that is, the linguistic features, can be modeled and automatically learned from data using deep learning techniques. He or she will be able to understand papers in the NLP field.

Basics of linear algebra and probability theory; foundations of machine learning; computer programming. 

Activities

3.5 ects. Natural Language Processing: Lecture (B-KUL-H02B1a)

3.5 ECTSEnglishFormat: Lecture20 First term
POC Artificial Intelligence

1. Introduction

  • What is natural language processing (NLP)?
  • Current state-of-the-art of NLP
  • Ambiguity
  • Other challenges
  • Representing words, phrases and sentences
     

2. Segmentation and tokenization

  • Regular expressions
  • Word tokenization, lemmatization and stemming
  • Sentence segmentation
  • Subword tokenization
     

3. Language Modelling

  • N-gram language models
  • perplexity
  • maximum likelihood estimation
  • smoothing
     

4. Neural Language Modelling

  • Word embeddings
  • Vector space models for NLP
  • Recurrent neural network (RNN) for language modelling
  • Transformer architecture for language modelling
  • Use of language models in downstream tasks: fine-tuning and pretraining
     

5. Part-of-Speech (POS) Tagging

  • Hidden Markov model and viterbi
  • Conditional Random Fields
  • (Bi)LSTM for POS tagging
  • Encoder-decoder architecture for sequence-to-sequence labeling
     

6. Morphological analysis

  • Inflection and derivation
  • Finite state morphology
  • Sequence-to-sequence neural models of morphological inflection
     

7. Syntactic Parsing

  • Universal Dependencies
  • Dependency parsing: Graph based dependency parsing, transition based dependency parsing
  • Constituent parsing with a (probabilistic) context free grammar ((P)CFG) and the Cocke-Younger-Kasami (CYK) algorithm
     

8. Semantics (lexical and compositional)

  • Word sense disambiguation
  • Semantic role labelling
     

9. Discourse: Coreference Resolution

  • Discourse coherence
  • Algorithm of Hobbs
  • Neural end-to-end coreference resolution
     

10. Question Answering

  • Evolution of QA systems from rule-based to neural 
  • Complex pipelines to end-to-end to retrieval-free
  • Closed-domain vs open-domain
  • Text-only vs multimodal
     

11. Neural Machine Translation

  • Encoder-decoder architecture (e.g., RNN, transformer-based)
  • Attention models
  • Improvements and alternative architectures that deal with limited parallel training data
     

12. Conversational Dialogue Systems and Chatbots

  • Task oriented dialog agents: Rule based versus neural based approaches
  • Chatbots: End-to-end sequence-to-sequence neural models
     

Handbooks

Daniel Jurafsky and James H. Martin. 2024. Speech and Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics, and Speech Recognition with Language Models,
3rd edition.

Jacob Eisenstein. 2019. Introduction to Natural Language Processing. MIT Press.

Yoav Goldberg. 2016. A Primer on Neural Network Models for Natural Language Processing.

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.

  
+ recent articles: e.g., of the proceedings of the Meetings of the ACL, AAAI, NeurIPS.

Interactive lectures with short exercises.

0.5 ects. Natural Language Processing: Exercises (B-KUL-H00G0a)

0.5 ECTSEnglishFormat: Practical13 First term
POC Artificial Intelligence

  • Exercises on tokenization and segmentation
  • Exercises on language modelling and POS tagging
  • Exercises on syntactic parsing
  • Exercises on semantic and discourse processing
  • Exercises on machine translation
  • Exercises on question answering

2 ects. Natural Language Processing: Project (B-KUL-H0O15a)

2 ECTSEnglishFormat: Assignment4 First term
N.
POC ir. Artificiële Intelligentie

The project focuses on gaining fundamental insights in advanced aspects of natural language processing especially with regard to learning representations of language data and solving a complex NLP task such as question answering, dialogue understanding and generation, or machine reading, and their involved subtasks. The assignment is a programming assignment which is given to the students in separate parts.

Assignment, explanations and documentation can be downloaded from the Toledo platform of KU Leuven

  • http://toledo.kuleuven.be

Computer session - Individual assignment - Project work

Evaluation

Evaluation: Advanced Natural Language Processing (B-KUL-H22B1b)

Type : Partial or continuous assessment with (final) exam during the examination period
Description of evaluation : Written, Project/Product, Report
Type of questions : Open questions, Closed questions
Learning material : Course material, Computer

There is a second exam opportunity except for the project work.