Information Retrieval and Search Engines

All programmes > Information Retrieval and Search Engines

Information Retrieval and Search Engines (B-KUL-H02C8A)

4 ECTS

English

Second term

Cannot be taken as part of an examination contract

de Lhoneux Miryam

POC Artificial Intelligence

Aims

The aim of the course is to study the current techniques and algorithms commonly used in information retrieval, Web search and Web mining, and the challenges of these fields. The theoretical insights are the basis for discussions of commercial systems and ongoing research projects. After the study of this course the student should be able to 1) describe and understand fundamental concepts and algorithms in information retrieval, Web search and Web mining; 2) design and evaluate an information retrieval system.

The exercise sessions give the opportunity to gain an in-depth understanding of the algorithms discussed during the lectures.

Previous knowledge

The course addresses students who are interested in the theory and applications of the processing, storage and retrieval of information. Elementary knowledge of statistics, probability theory and linear algebra is required. It is recommended that the student is familiar with machine learning methods.

Is included in these courses of study

Master in de toegepaste informatica (programma voor studenten gestart vóór 2024-2025) (Leuven) (Artificiële intelligentie) 60 ects.
Master handelsingenieur in de beleidsinformatica (Leuven) 120 ects.
Master handelsingenieur in de beleidsinformatica (Leuven) (Minor: Data science) 120 ects.
Master of Artificial Intelligence (Leuven) (Specialisation: Big Data Analytics (BDA)) 60 ects.
Master of Artificial Intelligence (Leuven) (Specialisation: Engineering and Computer Science (ECS)) 60 ects.
Master of Bioinformatics (Leuven) (Bioscience Engineering) 120 ects.
Master of Bioinformatics (Leuven) (Engineering) 120 ects.
Master of Information Management (Leuven) 60 ects.
Master of Biomedical Engineering (Programme for students started before 2021-2022) (Leuven) 120 ects.
Master of Business and Information Systems Engineering (Leuven) 120 ects.
Master of Business and Information Systems Engineering (Leuven) (Minor: Data Science) 120 ects.

expand

Activities

3 ects. Information Retrieval and Search Engines: Lecture (B-KUL-H02C8a)

3 ECTS

EnglishFormat: Lecture

Second term

de Lhoneux Miryam

POC Artificial Intelligence

Content

The motivation for the course lies in the urgent need for computer programs that assist people in digesting masses of unstructured information composed of text and other media. We need information retrieval technology when, for instance, we find information on the World Wide Web, in repositories of news and blogs, in biomedical document bases, or in governmental and company archives. Moreover, emails, tweets, other messages and advertisements are searched and filtered. Various techniques of content recognition, recommendation and linking play an increasing role and allow generating content models of the documents or messages, that effectively match the personalized information needs of users. We witness a current interest in capturing dynamic changes in the data and in modeling dynamic interactions with users. The proliferation of wireless and mobile devices such as mobile phones has additionally created a demand for effective and robust techniques to index, retrieve and summarize information.

The lectures treat the following topics:

1. Introduction

2. Advanced representations

Law of Zipf

Matrix factorization, latent semantic analysis (LSA), training with singular value decomposition

Probabilistic latent semantic analysis (pLSA), latent Dirichlet allocation (LDA), training with Expectation Maximization (EM) algorithms, Markov chain Monte Carlo (MCMC) methods such as Gibbs sampling, and with variational inference

Embeddings obtained with neural networks

3. Retrieval and search models

Algebraic models: vector space models

Probabilistic models: language retrieval models and Bayesian networks

Neural network models

4. Learning to rank

Relevance feedback, personalized and contextualized information needs, user profiling

Pointwise, pairwise and listwise approaches

Structured output support vector machines, loss functions, most violated constraints

End-to-end neural network models

Optimization of retrieval effectiveness and of diversity of search results

5. Dynamic retrieval and recommendation

Static versus dynamic models

Markov decision processes

Multi-armed bandit models

Modelling sessions

Online advertising

6. Multimedia information retrieval

Multimedia data types and features

Concept detection

Cross-modal indexing of content: latent Dirichlet allocation and deep learning methods

Cross-modal and multimodal retrieval and recommendation models

Illustrations with spoken document, image, video and music search

7. Web search

Web search engines, crawler-indexer architecture, query processing

Link analysis retrieval models: PageRank, HITS, personalized PageRank and variants

Behavior and credibility based retrieval models

Social search, mining and searching user generated content

8. Scalability of Web search

Data structures and search techniques

Inverted files, nextword indices, taxonomy indices, distributed indices

Compression

Learning of hashing functions, cross-modal hashing

Scalability and efficiency challenges

Architectural optimizations

9. Clustering

Distance and similarity functions in Euclidean and hyperbolic spaces, proximity functions

Sequential and hierarchical cluster algorithms, algorithms based on cost-function optimization, number of clusters

Term clustering for query expansion, document clustering, multiview clustering

10. Categorization

Feature selection, naive Bayes model, support vector machines, (approximate) k-nearest neighbor models

Deep learning methods

Multilabel and hierarchical categorization

Convolutional neural network (CNN) based hierarchical categorization

11. Summarization

Document segmentation, maximum marginal relevance

Summarization based on latent Dirichlet allocation models and long short-term memory (LSTM) networks

Abstractive summarization with attention models

Multidocument summarization, search results fusion and visualization

12. Question answering and conversational agents in search and recommendation

Retrieval based question answering

Deep learning methods including attention models

Cross-modal question answering

E-commerce search and recommendation

13. Evaluation measures and methodology

Recall, precision, F-measure, mean average precision, discounted cumulative gain, mean reciprocal answer rank, accuracy, confusion matrix, ROC curve, normalized mutual information, mean absolute error, root mean square error, pyramid method, inter-annotator agreement, test collections

14. Discussion of interesting research projects

15. Invited lecture by representative of an important company

In 2006-2007: Thomas Hofmann, Director of Engineering, Google Zurich European Engineering Centre, Switzerland; in 2007-2008: Ronny Lempel, director of Yahoo! research, Israel; in 2008-2009: Stephen Robertson, senior researcher at Microsoft Research Cambridge, UK and one of the founders of probabilistic modeling in information retrieval; in 2009-2010: Gregory Grefenstette, Chief Science Officer, Exalead, France; in 2010-2011: Mounia Lalmas, visiting senior researcher at Yahoo! Labs Barcelona, Spain; in 2011-2012: Jakub Zavrel, CEO and founder of TextKernel, The Netherlands; in 2012-2013: Massimiliano Ciaramita, senior research scientist at Google, Zürich, Switzerland; in 2013-2014: Alex Graves, senior research scientist at Google DeepMind, London, UK; in 2014-2015: Fabrizio Silvestri, Senior Scientist at Yahoo Labs, Barcelona; in 2015-2016: Roi Blanco, Senior Scientist at Yahoo Labs, London; in 2016-2017: Holger Schwenk, research scientist at Facebook AI Research, France and Dani Yogatama, research scientist at Google DeepMind, London, UK; in 2017-2018: Enrique Alfonseca, research tech leader at Google AI, Zurich; in 2020-2021: Florian Strub, senior researcher at Google Deepmind, and in 2021-2022: Rylan Conway, applied scientist at Amazon Seattle.

Course material

Course material is available on the Toledo-platform of the K.U.Leuven. The following books offer background to the course material:
Baeza-Yates, R. & Ribeiro-Neto, B. (2011). Modern Information Retrieval: The Concepts and Technology behind Search (2nd edition). Harlow, UK: Pearson.
Büttcher, S., Clarke, C.L.A. & Cormack, G.V. (2010). Information Retrieval: Implementing and Evaluating Search Engines. Cambridge, MA: MIT Press.
Manning, C.D., Raghaven, P. & Schütze, H. (2009). Introduction to Information Retrieval. Cambridge University Press.
Moens, M.-F. (2006). Information Extraction: Algorithms and Prospects in a Retrieval Context (International Series on Information Retrieval 21). Berlin: Springer.

Format: more information

Interactive lectures.

Is also included in other courses

H02C8B : Information Retrieval and Search Engines

1 ects. Information Retrieval and Search Engines: Exercises (B-KUL-H00G9a)

1 ECTS

EnglishFormat: Practical

Second term

de Lhoneux Miryam

POC Artificial Intelligence

Content

Exercise session on latent semantic models, probabilistic and vector models
Exercise session on learning to rank
Exercise session on dynamic retrieval
Exercise session on compression
Exercise session on categorization and clustering
Exercise session on link based and multimodal models

Course material

Exercises and answers are available via the Toledo platform.

Format: more information

Interactive exercise sessions in small groups.

Is also included in other courses

H02C8B : Information Retrieval and Search Engines

Evaluation

Evaluation: Information Retrieval and Search Engines (B-KUL-H22C8a)

Type : Exam during the examination period

Description of evaluation : Written

Type of questions : Open questions, Closed questions

Learning material : Calculator, Course material

Explanation

Explanation

Theory exam (grading: 50 %): Written, open book.

Exercise exam (grading: 50 %): Written, open book.

Share this page

Translations