The course introduces students into the principles and methods of natural language processing, blending traditional and current approaches. The course additionally aims at an understanding of the underlying computational properties of natural language and of current research directions. We study core tasks in natural language processing, including language modeling, syntactic analysis, semantic interpretation, coreference resolution, discourse analysis and machine translation. We discuss the underlying linguistic phenomena, i.e., the linguistic features, which are relevant to the task, and learn how to design suitable rule based and machine learning models. We link to current applications in real world settings.
Students have preferably followed the course Linguistic Theories for A.I.
Is included in these courses of study
- Master in de toegepaste informatica (Leuven) (Artificiële intelligentie) 60 ects.
- Master in de logopedische en audiologische wetenschappen (uitdovend programma vanaf 2015-2016) (Leuven) 60 ects.
- Master in de toegepaste economische wetenschappen: handelsingenieur in de beleidsinformatica (Leuven) 120 ects.
- Master of Artificial Intelligence (Leuven) (Option: Speech and Language Technology (SLT)) 60 ects.
- Master in de ingenieurswetenschappen: computerwetenschappen (Leuven) (Hoofdspecialisatie Artificiële intelligentie) 120 ects.
- Master of Digital Humanities (Leuven) 60 ects.
- Master of Engineering: Computer Science (Leuven) (Specialisation Artificial Intelligence) 120 ects.
- Master in de logopedische en audiologische wetenschappen (nieuw programma vanaf 2015-2016) (Leuven) 60 ects.
- Exercises on rule-based and probabilistic grammars / parsers.
- Exercises on semantic processing and integer linear programming.
- Exercises on (alignment in) machine translation.
- Exercises on language modeling and grammar induction.
What is Natural Language Processing (NLP)?
Ambiguity and uncertainty in language.
Introduction to the different analysis levels of NLP.
Applications of NLP.
Evaluation in NLP: Precision, recall, F-score, x-fold cross-validation, gold standards, good practices in NLP experiments, BLEU.
2. Formal Language Theory
Regular languages, regular grammars, and finite state automata, and their limitations.
Context free grammars and their limitations.
Weak Generative Capacity and Strong Generative Capacity.
Tree substitution grammars.
Tree adjoining grammars.
Finite State Transducers and their use in morphology.
3. Language Modeling and Sequential Tagging
Probabilistic language models, n-grams, generative models, unsupervised and semi-supervised models, smoothed estimation. Expectation Maximization (EM) techniques.
Hidden Markov Models (HMMs), Viterbi algorithm, forward-backward algorithm.
4. Part-of-Speech Tagging and Phrase Chunking
Part-of-Speech tagsets, often used linguistic categories.
Part-of-Speech tagging: rule based and probabilistic models.
Phrase chunking: rule based and probabilistic models.
5. Rule-based and Statistical Parsing and Chunking
Parsing, formal grammars, treebanks.
Context free grammars: constituency, Chomsky normal form, chart parsing (Earley algorithm), top-down and bottom-up parsing, Cocke-Younger-Kasami (CYK) algorithm, efficiency.
Limitations of context free grammars, implementing feature constraints.
Probabilistic context free grammars.
Probabilistic CKY parsing.
6. Lexical Semantics
Word sense disambiguation, collocations and lexical acquisition from large text corpora.
Word relations (similarity, hyponymy, etc.), distributional semantics, distributional models of meaning.
7. Computational Semantics
Semantic representations, lambda calculus, links with unification, logics, and reasoning.
8. Named Entity Recognition and Semantic Role Labeling
Discriminative supervised learning models: loglinear models, maximum entropy Markov models, conditional random fields, BIO-tagging and chunking.
9. Discourse Analysis
Noun phrase coreference resolution.
Rhetorical structure recognition.
Constraint-based reasoning, integer linear programming, column generation algorithms.
10. Alignment Algorithms for Machine Translation
Sentence alignment, word alignment, phrase alignment, tree alignment.
Alignment algorithms: Expectation Maximization algorithm, IBM models, HMM alignment, discriminative models.
11. Machine Translation
Language typology and divergences.
Rule-based models: direct translation, transfer systems, and interlingua systems.
Statistical models: word-based, phrase-based and syntax-based.
Synchronous context free grammars, synchronous tree substitution. grammars, synchronous tree-adjoining grammars.
12. Advanced Syntactic and Semantic Parsing
Lexicalized PCFGs, word classes, parsing with limited supervision, latent variable grammars.
Induction of grammars, inside-outside algorithm.
13. Temporal and Spatial Recognition
Temporal expression recognition, temporal normalization, TimeBank.
Event detection and analysis, temporal relation recognition and semantic dependency parsing.
Spatial relation recognition.
Daniel Jurafsky and James H. Martin, Speech and Language Processing, Prentice-Hall, 2006 (2nd edition).
Christopher D. Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
Philipp Koehn, Statistical Machine Translation, Cambridge University Press, 2009.
+ recent articles: e.g., of the proceedings of the Meetings of the Association for Computational Linguistics.
Format: more information
Open book written exam featuring a mixture of theory and exercise questions.