The course introduces students into the principles and methods of natural language processing, blending traditional and current approaches. We study algorithms for lexical, morphological, syntactic, semantic and discourse processing, and for machine translation. The course additionally aims at an understanding of the underlying computational properties of natural language and of current research directions.
Students have preferably followed the course Linguistic Theories for A.I.
Articles and literature
Is also included in other courses
- Master in de toegepaste informatica (Artificial Intelligence and Databases) 60 ects.
- Master in de logopedische en audiologische wetenschappen 60 ects.
- Master in de toegepaste economische wetenschappen: handelsingenieur in de beleidsinformatica 120 ects.
- Master of Artificial Intelligence (Option: Speech and Language Technology (SLT)) 60 ects.
- Master in de informatica (uitdovend, enkel 2e fase) (Specialisation: Artificial Intelligence) 120 ects.
- Master in de ingenieurswetenschappen: computerwetenschappen (Specialisation: Artificial Intelligence) 120 ects.
- Exercises on regular expressions and morphology.
- Exercises on rule-based and probabilistic grammars / parsers.
- Exercises on alignment in machine translation.
- Exercises on semantic processing.
What is Natural Language Processing (NLP)?
Ambiguity and uncertainty in language.
Introduction to the different analysis levels of NLP.
Applications of NLP.
Evaluation in NLP: Precision, recall, F-score, x-fold cross-validation, gold standards, good practices in NLP experiments, BLUE.
2. Computational Morphology and Finite State Automata
Word and sentence tokenization, inflected word forms, stemming, term splitting.
Regular languages and their limitations.
Finite state automata and transducers.
3. Language Modeling and Sequential Tagging
Probabilistic language models, n-grams, generative models, unsupervised and semi-supervised models, smoothed estimation, expectation-maximization (EM) techniques.
Hidden Markov Models (HMMs), Viterbi algorithm.
4. Part-of-Speech Tagging and Phrase Chunking
Part-of-speech tagging: rule based and probabilistic models.
Phrase chunking: rule based and probabilistic models.
5. Rule-based Parsing / Chunking
Parsing, formal grammars, treebanks.
Context free grammars: constituency, Chomsky normal form, chart parsing (Earley algorithm), top-down and bottom-up parsing, Cocke-Younger-Kasami (CYK) algorithm, efficiency.
Limitations of context free grammars, implementing feature constraints.
6. Statistical Parsing
Probabilistic context free grammars.
Probabilistic CKY parsing.
Parsing with limited supervision.
7. Lexical Semantics
Word sense disambiguation, collocations and lexical acquisition from large text corpora.
Word relations (similarity, hyponymy, etc.), distributional semantics, distributional models of meaning.
8. Computational Semantics
Semantic representations, lambda calculus, links with unification, logics, and reasoning.
9. Named Entity Recognition and Semantic Role Labeling
Discriminative supervised learning models: maximum entropy models (linear regression, logistic regression and Maxent), maximum entropy Markov models, conditional random fields, kernel methods.
10. Discourse Analysis
Noun phrase coreference resolution.
11. Alignment Algorithms for Machine Translation
Sentence alignment, word alignment, phrase alignment, tree alignment.
Alignment algorithms: EM algoirhtm, IBM models, HMM alignment, discriminative models.
12. Machine Translation
13. Temporal and Spatial Recognition
Temporal expression recognition, temporal normalization, TimeBank.
Event detection and analysis, temporal relation recognition and semantic dependency parsing.
Spatial relation recognition.
Description of learning activities
Daniel Jurafsky and James H. Martin, Speech and Language Processing, Prentice-Hall, 2006 (2nd edition).
Christopher D. Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
Philipp Koehn, Statistical Machine Translation, Cambridge University Press, 2009.
+ recent articles: e.g., of the proceedings of the Meetings of the Association for Computational Linguistics.
Open book written exam featuring a mixture of theory and exercise questions.