623.253 (17S) Selected Topics in Knwoledge and Data Engineering: Data Mining

Sommersemester 2017

Registration deadline has expired.

First course session
15.05.2017 09:00 - 15:00 Z.1.08 On Campus
... no further dates known

Overview

Lecturer
Course title german Selected Topics in Knowlegde and Data Engineering: Data Mining
Type Lecture - Course (continuous assessment course )
Hours per Week 2.0
ECTS credits 4.0
Registrations 8 (30 max.)
Organisational unit
Language of instruction English
Course begins on 15.05.2017
eLearning Go to Moodle course

Time and place

List of events is loading...

Course Information

Intended learning outcomes

Course description

Recent advances in data capture and data storage technologies have resulted in producing massive data sets in various application domains. These advances have also resulted in a growing gap between technological possibilities of gathering and storing huge volumes of data and user’s ability to understand and efficiently analyze the collected data. Many companies and organizations gather petabytes and exabytes of business transactions, pictures, scientific data, text reports and documents, which are simply too large and too complex to efficiently support decision making processes. On the other hand, traditional data analysis techniques and querying models are not appropriate to extract useful knowledge hidden in massive data sets that can be used to support decision making.

Data mining is a relatively new research area of data analysis which aims at nontrivial extraction of implicit, previously unknown and potentially useful information from massive repositories of data. Data mining can help answer business questions that were too time consuming to resolve with traditional data analysis techniques. Nowadays, the intensive development of data mining techniques results mainly from the rapid growth of new types of data generated and collected by different application domains: pictures, films, music, maps, time series, DNA sequences, chemical structures, documents, social networks, etc.

The course  is intended to provide a comprehensive introduction to basic data mining methods and algorithms from the algorithmic point of view. The course covers basic data mining methods like association analysis, sequential pattern discovery, classification, and clustering. For each of presented methods, basic representative algorithms are presented and discussed. The presentation of algorithms is illustrated with a set of examples.

Course content

1. Introduction

2. Association analysis

  • Problem definition
  • Naive algorithm
  • Apriori algorithm
  • FP-Growth algorithm
  • Multilevel association rules
  • Multidimensional association rules
  • Association rule evaluation

3. Classification

  • Problem definition
  • Decision tree induction
  • Rule-based classifiers
  • Naive Bayes classifiers
  • Nearest-Neighbor classifier
  • Ensemble classifiers
  • Evaluation of classifiers

4. Clustering

  • Problem formulation
  • Distance3 measures
  • Hierarchical clustering
  • Partitioning clustering
  • Overview of clustering methods

5. Sequential pattern discovery

  • Problem formulation
  • Apriori-like approach
  • PrefixSpan approach
  • Sequential pattern discovery with constraints

6. Road map of mining methods


Literature

Basic textbooks:

  • P-N Tan., M. Steinbach, V. Kumar, Introduction to Data Mining, Pearson, 2nd ed., 2012
  • J. Han, M. Kamber, J. Pei, Data Mining: Concepts and techniques, Morgan-Kaufmannn, 3rd ed., 2012

Additional literature:

  • I. A. Witten,  E. Frank, Practical Machine Learning Tools and Techniques, Morgan-Kaufmann,  4th ed., 2014
  • Ch. Aggarwal, Data Mining: the Textbook, Springer, 2015
  • B. Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer, 2nd ed., 2011
  •  T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: data Mining, Inference, and Prediction, Springer, 2nd ed., 2008

Examination information

Im Fall von online durchgeführten Prüfungen sind die Standards zu beachten, die die technischen Geräte der Studierenden erfüllen müssen, um an diesen Prüfungen teilnehmen zu können.

Examination methodology

The final exam is the written exam. The exam consists of a set of exercises. The exam is an open book and notes exam, which means that during the exam students may use notes and textbooks. Students are allowed to use laptops, computers and calculators. During the exam it is forbidden to communicate with other persons participating in the exam. It is also forbidden to use mobile phones during the exam or connect to the internet.

You can find the sample exam in Moodle.

Grading scheme

Grade / Grade grading scheme

Position in the curriculum

  • Master's degree programme Applied Informatics (SKZ: 911, Version: 13W.1)
    • Subject: Knowledge and Data Engineering (Compulsory elective)
      • Selected Topics in Information Systems Engineering ( 2.0h VK / 4.0 ECTS)
        • 623.253 Selected Topics in Knwoledge and Data Engineering: Data Mining (2.0h VC / 4.0 ECTS)

Equivalent courses for counting the examination attempts

This course is not assigned to a sequence of equivalent courses