623.253 (19S) Selected Topics in Knowlegde and Data Engineering: Data Mining

Sommersemester 2019

Anmeldefrist abgelaufen.

Erster Termin der LV
22.03.2019 10:00 - 11:00 S.2.69 - Bitmovin On Campus
... keine weiteren Termine bekannt

Überblick

Lehrende/r
LV-Titel englisch Selected Topics in Knwoledge and Data Engineering: Data Mining
LV-Art Vorlesung-Kurs (prüfungsimmanente LV )
Semesterstunde/n 2.0
ECTS-Anrechnungspunkte 4.0
Anmeldungen 8 (30 max.)
Organisationseinheit
Unterrichtssprache Englisch
LV-Beginn 22.03.2019
eLearning zum Moodle-Kurs
Anmerkungen

geblockte LV in der Woche vom 13. - 17. Mai 2019

Zeit und Ort

Liste der Termine wird geladen...

LV-Beschreibung

Intendierte Lernergebnisse

Course description

Recent advances in data capture and data storage technologies have resulted in producing massive data sets in various application domains. These advances have also resulted in a growing gap between technological possibilities of gathering and storing huge volumes of data and user’s ability to understand and efficiently analyze the collected data. Many companies and organizations gather petabytes and exabytes of business transactions, pictures, scientific data, text reports and documents, which are simply too large and too complex to efficiently support decision making processes. On the other hand, traditional data analysis techniques and querying models are not appropriate to extract useful knowledge hidden in massive data sets that can be used to support decision making.

Data mining is a relatively new research area of data analysis which aims at nontrivial extraction of implicit, previously unknown and potentially useful information from massive repositories of data. Data mining can help answer business questions that were too time consuming to resolve with traditional data analysis techniques. Nowadays, the intensive development of data mining techniques results mainly from the rapid growth of new types of data generated and collected by different application domains: pictures, films, music, maps, time series, DNA sequences, chemical structures, documents, social networks, etc.

The course  is intended to provide a comprehensive introduction to basic data mining methods and algorithms from the algorithmic point of view. The course covers basic data mining methods like association analysis, sequential pattern discovery, classification, and clustering. For each of presented methods, basic representative algorithms are presented and discussed. The presentation of algorithms is illustrated with a set of examples.

Inhalt/e

1. Introduction

2. Association analysis

  • Problem definition
  • Naive algorithm
  • Apriori algorithm
  • FP-Growth algorithm
  • Multilevel association rules
  • Multidimensional association rules
  • Association rule evaluation

3. Classification

  • Problem definition
  • Decision tree induction
  • Rule-based classifiers
  • Naive Bayes classifiers
  • Nearest-Neighbor classifier
  • Ensemble classifiers
  • Evaluation of classifiers

4. Clustering

  • Problem formulation
  • Distance3 measures
  • Hierarchical clustering
  • Partitioning clustering
  • Overview of clustering methods

5. Sequential pattern discovery

  • Problem formulation
  • Apriori-like approach
  • PrefixSpan approach
  • Sequential pattern discovery with constraints

6. Road map of mining methods


Literatur

Basic textbooks:

  • P-N Tan., M. Steinbach, V. Kumar, Introduction to Data Mining, Pearson, 2nd ed., 2012
  • J. Han, M. Kamber, J. Pei, Data Mining: Concepts and techniques, Morgan-Kaufmannn, 3rd ed., 2012

Additional literature:

  • I. A. Witten,  E. Frank, Practical Machine Learning Tools and Techniques, Morgan-Kaufmann,  4th ed., 2014
  • Ch. Aggarwal, Data Mining: the Textbook, Springer, 2015
  • B. Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer, 2nd ed., 2011
  •  T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: data Mining, Inference, and Prediction, Springer, 2nd ed., 2008

Prüfungsinformationen

Im Fall von online durchgeführten Prüfungen sind die Standards zu beachten, die die technischen Geräte der Studierenden erfüllen müssen, um an diesen Prüfungen teilnehmen zu können.

Beurteilungsschema

Note Benotungsschema

Position im Curriculum

  • Masterstudium Angewandte Informatik (SKZ: 911, Version: 13W.1)
    • Fach: Knowledge and Data Engineering (Wahlfach)
      • Selected Topics in Information Systems Engineering ( 2.0h VK / 4.0 ECTS)
        • 623.253 Selected Topics in Knowlegde and Data Engineering: Data Mining (2.0h VC / 4.0 ECTS)

Gleichwertige Lehrveranstaltungen im Sinne der Prüfungsantrittszählung

Diese Lehrveranstaltung ist keiner Kette zugeordnet