Natural Language Processing (B-KUL-H02B1A)

4.0 ECTS English 32.5 Second termSecond term Advanced Cannot be taken as part of an examination contract
POC Artificial Intelligence

The course introduces students into the principles and methods of natural language processing, blending traditional and current approaches. We study algorithms for lexical, morphological, syntactic, semantic and discourse processing, and for machine translation. The course additionally aims at an understanding of the underlying computational properties of natural language and of current research directions.

Students have preferably followed the course Linguistic Theories for A.I.

Articles and literature
Text book

Activities

0.5 ects. Natural Language Processing: Exercises (B-KUL-H00G0a)

0.5 ECTS English 13.0 Second termSecond term
POC Artificial Intelligence

  • Exercises on regular expressions and morphology. 

  • Exercises on rule-based and probabilistic grammars / parsers.

  • Exercises on alignment in machine translation.

  • Exercises on semantic processing. 


3.5 ects. Natural Language Processing: Lecture (B-KUL-H02B1a)

3.5 ECTS English 19.5 Second termSecond term
POC Artificial Intelligence

1. Introduction

What is Natural Language Processing (NLP)?

Ambiguity and uncertainty in language.
Introduction to the different analysis levels of NLP.
Applications of NLP.
Evaluation in NLP: Precision, recall, F-score, x-fold cross-validation, gold standards, good practices in NLP experiments, BLUE.

2. Computational Morphology and Finite State Automata

Word and sentence tokenization, inflected word forms, stemming, term splitting.
Morphologic analyzers.
Regular languages and their limitations.
Finite state automata and transducers.

3. Language Modeling and Sequential Tagging

Probabilistic language models, n-grams, generative models, unsupervised and semi-supervised models, smoothed estimation, expectation-maximization (EM) techniques.
Hidden Markov Models (HMMs), Viterbi algorithm.

4. Part-of-Speech Tagging and Phrase Chunking

Part-of-speech tagging: rule based and probabilistic models.
Phrase chunking: rule based and probabilistic models.

5. Rule-based Parsing / Chunking

Parsing, formal grammars, treebanks.
Context free grammars: constituency, Chomsky normal form, chart parsing (Earley algorithm), top-down and bottom-up parsing, Cocke-Younger-Kasami (CYK) algorithm, efficiency.
Limitations of context free grammars, implementing feature constraints.


6. Statistical Parsing

Probabilistic context free grammars.
Probabilistic CKY parsing.
Collins parser.
Parsing with limited supervision.

7. Lexical Semantics

Word sense disambiguation, collocations and lexical acquisition from large text corpora.
Word relations (similarity, hyponymy, etc.), distributional semantics, distributional models of meaning.

8. Computational Semantics

Semantic representations, lambda calculus, links with unification, logics, and reasoning.

9. Named Entity Recognition and Semantic Role Labeling

Discriminative supervised learning models: maximum entropy models (linear regression, logistic regression and Maxent), maximum entropy Markov models, conditional random fields, kernel methods.
PropBank, FrameNet.

10. Discourse Analysis

Noun phrase coreference resolution.
Rhetorical classification.
Argumentation mining.

11. Alignment Algorithms for Machine Translation

Sentence alignment, word alignment, phrase alignment, tree alignment.
Alignment algorithms: EM algoirhtm, IBM models, HMM alignment, discriminative models.

12. Machine Translation

Rule-based models.
Example-based models.
Word-based models.
Phrase-based models.
Syntax-based models.

13. Temporal and Spatial Recognition

Temporal expression recognition, temporal normalization, TimeBank.

Event detection and analysis, temporal relation recognition and semantic dependency parsing.
Spatial relation recognition.

Lectures.

Handbooks
 
Daniel Jurafsky and James H. Martin, Speech and Language Processing, Prentice-Hall, 2006 (2nd edition).
 
Christopher D. Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
  
Philipp Koehn, Statistical Machine Translation, Cambridge University Press, 2009.   

  
+ recent articles: e.g., of the proceedings of the Meetings of the Association for Computational Linguistics.

Evaluation

Evaluation : Natural Language Processing (B-KUL-H22B1a)

Mode of evaluation : Written
Category : final examination during examination period
Type of evaluation : Open book

Open book written exam featuring a mixture of theory and exercise questions.