Natural Language Processing (B-KUL-H02B1A)

4 ECTSEnglish33 First termFirst termCannot be taken as part of an examination contract
POC Artificial Intelligence

The course introduces the students into the principles and methods of natural language processing, with a focus on current methods and technologies. The course additionally aims at an understanding of the underlying computational properties of natural language and of current research directions. We study core tasks in natural language processing, including language modeling, syntactic analysis, semantic interpretation, coreference resolution, discourse analysis and machine translation. We discuss the underlying linguistic phenomena, i.e., the linguistic features, and how they can be modeled and automatically learned from data, which includes an introduction to current deep learning approaches. We illustrate the methods and technologies with current applications in real world settings.

Students have preferably followed the course Linguistic Theories for A.I.


0.5 ects. Natural Language Processing: Exercises (B-KUL-H00G0a)

0.5 ECTSEnglishFormat: Practical13 First termFirst term
POC Artificial Intelligence

  • Exercises on rule-based and probabilistic grammars / parsers.
  • Exercises on language modeling and word embeddings.
  • Exercises on vector semantics and models for sentence meaning.
  • Exercises on semantic parsing: structured learning and prediction, and integrating integer linear programming constraints.
  • Exercises on temporal and spatial recognition in language.
  • Exercises on (alignment in) machine translation.

3.5 ects. Natural Language Processing: Lecture (B-KUL-H02B1a)

3.5 ECTSEnglishFormat: Lecture20 First termFirst term
POC Artificial Intelligence

1. Introduction

  • What is natural language processing (NLP)?
  • Ambiguity and uncertainty in language.
  • Introduction to the different analysis levels of NLP.
  • Applications of NLP.
  • Evaluation in NLP: Precision, recall, F-score, x-fold cross-validation, gold standards, good practices in NLP experiments.


2. Language Modeling and Word Embeddings

  • Probabilistic language models, n-grams, smoothed estimation.
  • Introduction to neural networks.
  • Word embeddings.


3. Part-of-Speech Tagging and Phrase Chunking

  • Part-of-Speech tagsets, common linguistic categories.
  • Part-of-Speech tagging: rule based and probabilistic models.
  • Phrase chunking: rule based and probabilistic models.


4. Rule-based and Statistical Parsing and Chunking

  • Parsing with a context free grammar (CFG).
  • CFG: constituency, Chomsky normal form, chart parsing (Earley algorithm), top-down and bottom-up parsing, Cocke-Younger-Kasami (CYK) algorithm, efficiency.
  • Limitations of context free grammars, implementing feature constraints.
  • Parsing with a probabilistic context free grammar (PCFG).
  • Probabilistic CKY parsing.
  • Weak generative capacity and strong generative capacity.
  • Tree substitution grammars.
  • Tree adjoining grammars.


5. Lexical and Distributional Semantics

  • Word sense disambiguation, collocations and lexical acquisition from large text corpora.
  • Vector semantics and compositionality.


6. Named Entity Recognition and Semantic Role Labeling

  • Introduction to maximum entropy models and sequence tagging for named entity recognition and semantic role labeling.
  • Feature engineering.
  • Short reference to hidden Markov models (HMMs).
  • Conditional random fields: training and decoding.


7. Discourse Analysis

  • Seminal algorithms of coreferent resolution (Hobbs, Grosz).
  • Pairwise classification in coreference resolution (supervised and unsupervised) and structured prediction based on constraints implemented as integer linear programs.
  • Introduction of rhetorical structure analysis.


8. Semantic Parsing

  • Semantic knowledge representation illustrated with a few examples coded in lambda calculus.
  • Mapping to logical forms and denotations.
  • A simple structured learning algorithm and its training with a stochastic gradient descent algorithm.


9. Temporal and Spatial Recognition

  • Temporal expression recognition, temporal normalization, temporal relation recognition.
  • Spatial relation recognition.
  • Feature engineering.
  • Introduction to Markov logic and structured perceptrons.


10. Advanced Syntactic and Sematic Parsing

  • Lexicalized PCFGs.
  • Parsing with limited supervision, latent variable grammars.
  • Grammar induction.
  • Distributional syntax.
  • Unsupervised tagging.


11+12. Alignment Algorithms and  Machine Translation

  • Language typology and divergences.
  • Rule-based models: direct translation, transfer systems, and interlingua systems.
  • Discriminative models.
  • Statistical models: word-based, phrase-based and syntax-based.
  • Sentence alignment, word alignment, phrase alignment, tree alignment.
  • Alignment algorithms: Expectation Maximization algorithm, IBM models, HMM alignment.


13. Introduction to Deep Learning in NLP

  • Introduction to recurrent neural networks (RNNs) and long short-term memory (LSTM) models.
  • Attention-based encoders and decoders.
  • Illustrated with neural network based translation, semantic role labeling and implicit rhetorical relation recognition.


Daniel Jurafsky and James H. Martin, Speech and Language Processing, Prentice-Hall, 2006 (2nd edition).
Christopher D. Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
Philipp Koehn, Statistical Machine Translation, Cambridge University Press, 2009.   

+ recent articles: e.g., of the proceedings of the Meetings of the Association for Computational Linguistics.



Evaluation : Natural Language Processing (B-KUL-H22B1a)

Type : Exam during the examination period
Description of evaluation : Written
Type of questions : Open questions, Closed questions
Learning material : Course material

Open book written exam featuring a mixture of theory and exercise questions.