Natural Language Processing (B-KUL-H02B1A)

4.0 ECTS English 32.5 Second termSecond term Advanced Cannot be taken as part of an examination contract
POC Artificial Intelligence

The course introduces students into the principles and methods of natural language processing, blending traditional and current approaches. The course additionally aims at an understanding of the underlying computational properties of natural language and of current research directions. We study core tasks in natural language processing, including language modeling, syntactic analysis, semantic interpretation, coreference resolution, discourse analysis and machine translation. We discuss the underlying linguistic phenomena, i.e., the linguistic features, which are relevant to the task, and learn how to design suitable rule based and machine learning models. We link to current applications in real world settings.

Students have preferably followed the course Linguistic Theories for A.I.

Activities

0.5 ects. Natural Language Processing: Exercises (B-KUL-H00G0a)

0.5 ECTSEnglishFormat: Practical13.0Second termSecond term
POC Artificial Intelligence

     
    • Exercises on rule-based and probabilistic grammars / parsers.
    • Exercises on semantic processing and integer linear programming.
    • Exercises on (alignment in) machine translation.
    • Exercises on language modeling and grammar induction. 

       

    3.5 ects. Natural Language Processing: Lecture (B-KUL-H02B1a)

    3.5 ECTSEnglishFormat: Lecture19.5Second termSecond term
    POC Artificial Intelligence

    1. Introduction

    What is Natural Language Processing (NLP)?

    Ambiguity and uncertainty in language.
    Introduction to the different analysis levels of NLP.
    Applications of NLP.
    Evaluation in NLP: Precision, recall, F-score, x-fold cross-validation, gold standards, good practices in NLP experiments, BLEU.

    2. Formal Language Theory

    Regular languages, regular grammars, and finite state automata, and their limitations.
    Context free grammars and their limitations.
    Weak Generative Capacity and Strong Generative Capacity.
    Tree substitution grammars.
    Tree adjoining grammars.
    Finite State Transducers and their use in morphology.

    3. Language Modeling and Sequential Tagging

    Probabilistic language models, n-grams, generative models, unsupervised and semi-supervised models, smoothed estimation. Expectation Maximization (EM) techniques.
    Hidden Markov Models (HMMs), Viterbi algorithm, forward-backward algorithm.

    4. Part-of-Speech Tagging and Phrase Chunking

    Part-of-Speech tagsets, often used linguistic categories.
    Part-of-Speech tagging: rule based and probabilistic models.
    Phrase chunking: rule based and probabilistic models.

    5. Rule-based and Statistical Parsing and Chunking

    Parsing, formal grammars, treebanks.
    Context free grammars: constituency, Chomsky normal form, chart parsing (Earley algorithm), top-down and bottom-up parsing, Cocke-Younger-Kasami (CYK) algorithm, efficiency.
    Limitations of context free grammars, implementing feature constraints.

    Probabilistic context free grammars.
    Probabilistic CKY parsing.

    6. Lexical Semantics

    Word sense disambiguation, collocations and lexical acquisition from large text corpora.
    Word relations (similarity, hyponymy, etc.), distributional semantics, distributional models of meaning.

    7. Computational Semantics

    Semantic representations, lambda calculus, links with unification, logics, and reasoning.

    8. Named Entity Recognition and Semantic Role Labeling

    Discriminative supervised learning models: loglinear models, maximum entropy Markov models, conditional random fields, BIO-tagging and chunking.
    PropBank, FrameNet.

    9. Discourse Analysis

    Noun phrase coreference resolution.
    Rhetorical structure recognition.
    Constraint-based reasoning, integer linear programming, column generation algorithms.

    10. Alignment Algorithms for Machine Translation

    Sentence alignment, word alignment, phrase alignment, tree alignment.
    Alignment algorithms: Expectation Maximization algorithm, IBM models, HMM alignment, discriminative models.

    11. Machine Translation

    Language typology and divergences.
    Rule-based models: direct translation, transfer systems, and interlingua systems.
    Statistical models: word-based, phrase-based and syntax-based.
    Synchronous context free grammars, synchronous tree substitution. grammars, synchronous tree-adjoining grammars.

    12. Advanced Syntactic and Semantic Parsing

    Lexicalized PCFGs, word classes, parsing with limited supervision, latent variable grammars.
    Induction of grammars, inside-outside algorithm.

    13. Temporal and Spatial Recognition

    Temporal expression recognition, temporal normalization, TimeBank.
    
Event detection and analysis, temporal relation recognition and semantic dependency parsing.
    Spatial relation recognition.

    Handbooks
     
    Daniel Jurafsky and James H. Martin, Speech and Language Processing, Prentice-Hall, 2006 (2nd edition).
     
    Christopher D. Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
      
    Philipp Koehn, Statistical Machine Translation, Cambridge University Press, 2009.   

      
    + recent articles: e.g., of the proceedings of the Meetings of the Association for Computational Linguistics.

    Lectures.

    Evaluation

    Evaluation : Natural Language Processing (B-KUL-H22B1a)

    Type : Exam during the examination period
    Description of evaluation : Written
    Type of questions : Open questions, Closed questions
    Learning material : Course material

    Open book written exam featuring a mixture of theory and exercise questions.