Linked Data Scholarship (B-KUL-F0XO5A)

6 ECTSEnglish39 First term
Fantoli Margherita |  Daems Dries (substitute)
POC Digital Humanities

The aim of this interdisciplinary and applied course is to provide students with a conceptual and technical understanding of linked (open) data – that is, of the of the theoretical models and technologies that allow (research) data to be shared on the World Wide Web in a meaningful, connected, and open manner. 

 

On a theoretical level, students will learn to operationalize humanistic theory and thought (including modelling theory and semiotics) to reflect critically and in a culturally-informed way on the World Wide Web as an information space. By engaging in ongoing debates in the field of scholarly communication (including topical discussions on open science, open data and net neutrality), students will be familiarized with the potential and limitations of knowledge representation in the semantic web of linked (open) data.

 

On a practical level, students’ data literacy and data skills will be developed through real-world examples and exercises in structuring, publishing and querying datasets as linked data. In particular, students will be introduced to elements of the semantic web protocol stack that are relevant for research modelling in the Digital Humanities and related fields (including vocabularies like RDF(s), OWL and meta-ontologies such as SKOS and CIDOC-CRM). Familiarity with the workings and applications of linked open data technologies prepares students for future research projects, as well as data science- and data curation-related assignments in an range of (research) institutions, including universities, (scientific) libraries and cultural heritage institutions, government agencies, media and corporate organizations, and other data-driven operations.

Basic computing and information skills. Required knowledge of World Wide Web architecture components and data models will be acquired during the first sessions of the course.  

Activities

6 ects. Linked Data Scholarship (B-KUL-F0XO5a)

6 ECTSEnglishFormat: Lecture39 First term
Fantoli Margherita |  Daems Dries (substitute)
POC Digital Humanities

Recent years have seen a steep increase in digitally-available (research) data. This so-called ‘data deluge’ has provoked a wide range of responses from the scientific community. According to some, the advent of ‘big data’ has ushered in ‘the end of theory’ and even the end of the scientific method. Others, however, have pointed out that with the availability of large amounts of digital data, there comes a pressing need for structures and models to share and organize these data in an efficient, meaningful, and open manner. One such approach to open data publishing and modelling is ‘linked (open) data’, an approach supported by semantic web technologies.

With the interest in linked (open) data (and the semantic web) on the rise, this course provides a conceptual and technical introduction to linked open data and semantic web technologies. The course consists of an introductory session, followed by four thematic clusters (each of which comprises two or three sessions) and a concluding session. Sessions will be held in lecture form, leaving ample room for group discussions and exercises. The course’s contents are structured as follows:

In the introductory session (session 1), students will be familiarized with the problem of modelling data and knowledge, and how cultural discourses and humanist thinking can inform how present-day web technologies are deployed to this aim. Cultural reflections on the intersections between data, knowledge and technologies will be surveyed. This includes (but is not limited to) cultural imaginations of data-linking technologies ranging from Vannevar Bush’s notion of the ‘Memex’, over Tim Berners-Lee’s vision of the Internet and of "philosophical engineering", to Ted Nelson’s notion of ‘hypertext’. .

Cluster 1: The web as a global data space

The first session of Cluster 1 (session 2) consists of a basic introduction of the architectural components of the World Wide Web (including HTTP, URLs, URIs, etc.). Students will come to understand how the World Wide Web as such is ideally suited to house (textual) documents, but also that the dominant web architecture has its limitations when it comes to publishing other types of (research) data. This observation will be framed by a debate on a number of topical issues in the field of scholarly communication, including open science, open data, net neutrality and the decentralized web.

The second session in this cluster (session 3 overall), is dedicated to the reasoning behind linked data and how it might provide a solution to some of the problems introduced in earlier sessions.  It will be demonstrated how linked data structures enable sophisticated forms of data processing and how links between data can be used to break data silos and connect distributed data. Through a number of real-world examples (e.g. the Linked Leaks publication of the ‘Panama Papers’ as linked data), it will be shown how linked data can ensure that data can be accessed and discovered more easily, and that these data become a more integrated part of the World Wide Web.

Cluster 2: Technical principles and topology of the semantic web

The first session of this cluster (session 4 overall) will deal with the technical aspects of the semantic web. Triples and the resource description framework (RDF) at the core of the semantic web will be introduced and it will be shown how these are implemented from a technological perspective. The RDF-model will be related to other models that students might encounter during their studies and research in Digital Humanities, such as tabular data, relational databases and XML-databases.

The second session in this cluster (session 5 overall) explores the history and topology of the semantic web. In this session, students will analyse a number of graphical representations of the linked open data cloud to determine how it has grown over the years. The various types of data currently available in the linked open data cloud will be evaluated, including cross-domain data, geographic data, government data, media data, library data, data related to the life sciences, and user content. Special attention will be devoted to the share of humanities-related projects and -data in the linked open data cloud.

Cluster 3: Modelling and publishing linked data

The third cluster deals with linked data from a data publishers’ perspective and answers the question of what models might best be suited for Digital Humanities students and scholars to share their data as linked data. The first two sessions in this cluster (sessions 6 and 7 overall) will be hands-on, practically-oriented sessions that cater to the data and information needs of Digital Humanities Scholars. The possibilities and limitations of existing vocabularies and ontologies to model (research) data (including RDFs, OWL, SKOS, CIDOC-CRM) will be explored by means of case-based exercises. Students will learn how to evaluate the types of relations and links that might be defined between the objects, persons and documents that figure in their specific research areas. The modelling assignment that has to be prepared for the final exam will offer opportunities to explore these issues in greater detail. 

In the third session of this cluster (session 8 overall) will be reserved for a theoretical and philosophical exploration of the requirements for signification on the semantic web. Questions that will be addressed include: how much context is required for scholarly publications or cultural heritage objects to be part of a knowledge space that enables their interpretation? And to what extent do web-based formal knowledge spaces differ (semiologically) from natural-language based textual environments?

Cluster 4: Consuming linked data

Cluster 4 is devoted to the ways in which datasets published as linked data can be queried. In the first session of this cluster (session 9 overall), students will learn about SPARQL, the language for querying linked data. The workings of this language will be illustrated by means of exercises in querying DBPedia, an initiative that draws information from the online encyclopaedia Wikipedia. The possibilities and limitations of this approach will be contrasted with that of other information retrieval methods (e.g; Google search, etc.).

In the second session of this cluster (session 10 overall), findings from the previous session will be reflected on in more depth. Questions that will be up for discussion include: what types of knowledge can be retrieved from querying datasets published as linked data? How does linked data transform our notion of ‘searching’? What (new) kinds of research questions does linked data inform? What new meanings can be assigned to the ideas of sources and ‘documents’?

The third session of this cluster (session 11 overall) identifies some pressing issues in the field of linked data scholarship and open avenues for future research. This includes issues of versioning, provenance, authorization and semantic redundancy, as well as the incorporation of new technologies like graph databases.

The course wraps up with a final, synthesizing session (session 12 overall) in which students will get the opportunity to ask questions in preparation for the exam. A thirteenth session is reserved for a guest lecture by a speaker active in the field of linked data scholarship.

 

Syllabus, reading material and links to example projects posted on the Toledo platform

Powerpoint slides accompanying the lectures

Lecture notes

Students should have access to a personal computer in order to participate in class exercises

Evaluation

Evaluation: Linked Data Scholarship (B-KUL-F2XO5a)

Type : Partial or continuous assessment with (final) exam during the examination period
Description of evaluation : Oral, Paper/Project, Participation during contact hours
Learning material : Computer


Students are required to participate in class discussions and should provide a paper with an RDF-based modelling exercise in a subject of their choosing. An oral exam during the exam period will cover the modelling exercise and two questions out of a list of approximately 10 that have been dealt with during the course