Big Data Analytics Programming (B-KUL-H00Y4A)

6 ECTSEnglish36 Both termsCannot be taken as part of an examination contract
POC Artificial Intelligence

The goal of this course is to familiarize students with the different types of programming environments they may encounter or need to utilize when analyzing large-scale data sets.  The course consists of three parts or modules.  Each module will begin with background lectures that introduce and cover the relevant topics. The key concepts will be reinforced during practical exercise sessions. Then the students will be expected to use these skills in order to complete programming projects.

Note that this class runs over both semesters. That is, there will be projects and lectures in both the first and second semester. It is not possible to follow this class for only one of the semesters.

Strong background and experience with advanced data structures and algorithms, including topics such as hash tables/maps/sets, sorting algorithms, queues, search trees, etc. Understanding of time and space complexity of algorithms.  Excellent programming ability in  Java, C++, C, or a similar language. General familiarity with relational databases.

Mixed prerequisite:
You may only take this course if you comply with the prerequisites. Prerequisites can be strict or flexible, or can imply simultaneity. A degree level can be also be a prerequisite.
Explanation:
STRICT: You may only take this course if you have passed or applied tolerance for the courses for which this condition is set.
FLEXIBLE: You may only take this course if you have previously taken the courses for which this condition is set.
SIMULTANEOUS: You may only take this course if you also take the courses for which this condition is set (or have taken them previously).
DEGREE: You may only take this course if you have obtained this degree level.


(SIMULTANEOUS(H02C1A) OR SIMULTANEOUS(H0E96A) OR SIMULTANEOUS(H0E98A)) AND SIMULTANEOUS(H02C6A)

The codes of the course units mentioned above correspond to the following course descriptions:
H02C1A : Machine Learning and Inductive Inference
H0E96A : Beginselen van machine learning
H0E98A : Principles of Machine Learning
H02C6A : Data Mining

Activities

2.5 ects. Big Data Analytics Programming: Lecture (B-KUL-H00Y4a)

2.5 ECTSEnglishFormat: Lecture21 Both terms
POC Artificial Intelligence

Note that the order that the topics are covered in can vary from year to year.

Part I: Basics
1. Introduction and overview
2. Background on hashing, computer organization, complexity basics, etc.
3. Databases basics: SQL, join algorithms, index structures
4. Advanced topics: Fancy indexes, column store, warehouses, noSQL
 

Part II: Structures and techniques for efficiency
1. Introduction an overview
2. Learning from data streams
3. Fast nearest neighbors algorithms
4. Implementation tricks
5. Approximation methods (e.g., sketches, sampling)
6. Advanced topics?

Part III: Parallel Architectures
1. Introduction and overview
2. Types of parallelism (e.g., shared memory, shared nothing)
3. Concurrency
4. Parallel programming bugs (e.g., data races, deadlock, etc.)
5. Map-reduce
6. Cloud computing
7. Condor?

Lecture slides, readings, and online resources

0.5 ects. Big Data Analytics: Exercises (B-KUL-H00Y5a)

0.5 ECTSEnglishFormat: Practical15 Both terms
POC Artificial Intelligence

1. Part I: Query Languages
1. Introduction and overview
2. SQL: selection, projection, select-project-join, group-by, aggregates, subqueries, nested queries
3. Xquery
4. Sparql
5. Writing applications that can interface with a DBMS
6. Indexing?

2. Part II: Scripting Languages
1. Introduction an overview
2. Perl/Python

3. Part III: Parallel Architectures
1. Introduction and overview
2. Types of parallelism (e.g., shared memory, shared nothing)
3. Concurrency
4. Parallel programming bugs (e.g., data races, deadlock, etc.)
5. Map-reduce
6. Cloud computing
7. Condor?

Exercise slides

3 ects. Big Data Analytics: Assignments (B-KUL-H00Y6a)

3 ECTSEnglishFormat: AssignmentBoth terms
POC Artificial Intelligence

Examples of the types of assignments
1. Given a set of verbal queries, translate them into a query language
2. Write queries to extract information needed for a machine learning task
3. Implement advanced machine learning algorithms (e.g., for learning from streaming data)
4. Implement an advanced data mining algorithm
5. A project that uses Hadoop, Spark, etc.

Assignment sheets

Evaluation

Evaluation: Big Data Analytics Programming (B-KUL-H20Y4a)

Type : Continuous assessment without exam during the examination period
Description of evaluation : Project/Product, Take-Home


The evaluation of the course will be based on multiple  programming assignments. Solutions are evaluated in terms of correctness, efficiency and generalizability.

Projects that are independent mean that students must complete the assignment individually. Thus using outside sources (e.g., publicly available code, etc.) or working together (e.g., working with somone else to solve the assignment, getting substantial help from someone else to solve the assignment, etc.) is strictly forbidden. If you questions about what is and is not permitted, please consult the instructor.

For the project assignments with a failed result, the student will have an opportunity to complete an alternative assignment.