623.253 (17S) Selected Topics in Knowlegde and Data Engineering: Data Mining

Sommersemester 2017

Anmeldefrist abgelaufen.

Erster Termin der LV
15.05.2017 09:00 - 15:00 , Z.1.08
... keine weiteren Termine bekannt


LV-Titel englisch
Selected Topics in Knwoledge and Data Engineering: Data Mining
Vorlesung-Kurs (prüfungsimmanente LV )
8 (30 max.)
zum Moodle-Kurs


Intendierte Lernergebnisse

Course description

Recent advances in data capture and data storage technologies have resulted in producing massive data sets in various application domains. These advances have also resulted in a growing gap between technological possibilities of gathering and storing huge volumes of data and user’s ability to understand and efficiently analyze the collected data. Many companies and organizations gather petabytes and exabytes of business transactions, pictures, scientific data, text reports and documents, which are simply too large and too complex to efficiently support decision making processes. On the other hand, traditional data analysis techniques and querying models are not appropriate to extract useful knowledge hidden in massive data sets that can be used to support decision making.

Data mining is a relatively new research area of data analysis which aims at nontrivial extraction of implicit, previously unknown and potentially useful information from massive repositories of data. Data mining can help answer business questions that were too time consuming to resolve with traditional data analysis techniques. Nowadays, the intensive development of data mining techniques results mainly from the rapid growth of new types of data generated and collected by different application domains: pictures, films, music, maps, time series, DNA sequences, chemical structures, documents, social networks, etc.

The course  is intended to provide a comprehensive introduction to basic data mining methods and algorithms from the algorithmic point of view. The course covers basic data mining methods like association analysis, sequential pattern discovery, classification, and clustering. For each of presented methods, basic representative algorithms are presented and discussed. The presentation of algorithms is illustrated with a set of examples.


1. Introduction

2. Association analysis

  • Problem definition
  • Naive algorithm
  • Apriori algorithm
  • FP-Growth algorithm
  • Multilevel association rules
  • Multidimensional association rules
  • Association rule evaluation

3. Classification

  • Problem definition
  • Decision tree induction
  • Rule-based classifiers
  • Naive Bayes classifiers
  • Nearest-Neighbor classifier
  • Ensemble classifiers
  • Evaluation of classifiers

4. Clustering

  • Problem formulation
  • Distance3 measures
  • Hierarchical clustering
  • Partitioning clustering
  • Overview of clustering methods

5. Sequential pattern discovery

  • Problem formulation
  • Apriori-like approach
  • PrefixSpan approach
  • Sequential pattern discovery with constraints

6. Road map of mining methods


Basic textbooks:

  • P-N Tan., M. Steinbach, V. Kumar, Introduction to Data Mining, Pearson, 2nd ed., 2012
  • J. Han, M. Kamber, J. Pei, Data Mining: Concepts and techniques, Morgan-Kaufmannn, 3rd ed., 2012

Additional literature:

  • I. A. Witten,  E. Frank, Practical Machine Learning Tools and Techniques, Morgan-Kaufmann,  4th ed., 2014
  • Ch. Aggarwal, Data Mining: the Textbook, Springer, 2015
  • B. Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer, 2nd ed., 2011
  •  T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: data Mining, Inference, and Prediction, Springer, 2nd ed., 2008



The final exam is the written exam. The exam consists of a set of exercises. The exam is an open book and notes exam, which means that during the exam students may use notes and textbooks. Students are allowed to use laptops, computers and calculators. During the exam it is forbidden to communicate with other persons participating in the exam. It is also forbidden to use mobile phones during the exam or connect to the internet.

You can find the sample exam in Moodle.


Note/Grade Benotungsschema

Position im Curriculum

  • Masterstudium Angewandte Informatik (SKZ: 911, Version: 13W.1)
    • Fach: Knowledge and Data Engineering (Wahlfach)
      • Selected Topics in Information Systems Engineering ( 2.0h VK / 4.0 ECTS)
        • 623.253 Selected Topics in Knowlegde and Data Engineering: Data Mining (2.0h VC / 4.0 ECTS)

Gleichwertige Lehrveranstaltungen im Sinne der Prüfungsantrittszählung

Diese Lehrveranstaltung ist keiner Kette zugeordnet