623.714 (22W) Selected Topics in Distributed Multimedia Systems: Hands-on Reinforcement Learning

Wintersemester 2022/23

Anmeldefrist abgelaufen.

Erster Termin der LV
09.11.2022 08:00 - 10:00 V.1.07 On Campus
... keine weiteren Termine bekannt


Bedingt durch die COVID-19-Pandemie können kurzfristige Änderungen bei Lehrveranstaltungen und Prüfungen (z.B. Absage von Präsenz-Lehreveranstaltungen und Umstellung auf Online-Prüfungen) erforderlich sein.

Weitere Informationen zum Lehrbetrieb vor Ort finden Sie unter: https://www.aau.at/corona.
LV-Titel englisch Selected Topics in Distributed Multimedia Systems: Hands-on Reinforcement Learning
LV-Art Vorlesung-Kurs (prüfungsimmanente LV )
LV-Modell Präsenzlehrveranstaltung
Semesterstunde/n 2.0
ECTS-Anrechnungspunkte 4.0
Anmeldungen 17 (30 max.)
Unterrichtssprache Englisch
mögliche Sprache/n der Leistungserbringung Englisch
LV-Beginn 09.11.2022
eLearning zum Moodle-Kurs

Zeit und Ort

Beachten Sie bitte, dass sich aufgrund von COVID-19-Maßnahmen die derzeit angezeigten Termine noch ändern können.
Liste der Termine wird geladen...


Intendierte Lernergebnisse

This course introduces Artificial Intelligence (AI) and optimization in a fun, easy, interesting, immersive, and hands-on way. Reinforcement Learning (RL) is part of a decades-long trend within AI and machine learning back toward simple general principles. RL is the closest to the kind of learning that humans and other animals do, and many of the core algorithms of RL were originally inspired by biological learning systems. RL has also given back, both through a psychological model of animal learning that better matches some of the empirical data, and through an influential model of parts of the brain's reward system. A good way to understand reinforcement learning is to consider some of the examples and possible applications that have guided its development

  • A master chess player makes a move. The choice is informed both by planning - anticipating possible replies and counterreplies - and my immediate, intuitive judgments of the desirability of particular positions and moves
  • An adaptive controller adjusts the parameters of a petroleum refinery's operations in real-time. The controller optimizes the yield/cost/quality trade-off on the basis of specified marginal costs without sticking strictly to the setpoints originally suggested by engineers.
  • A gazelle calf struggles to its feet minutes after being born. Half an hour later it is running at 32 kilometers per hour.
  • A mobile robot decides whether it should enter a new room in search of more trash to collect or start trying to find its way back to its battery recharging station. It makes its decisions based on the current charge level of its battery and how quickly and easily it has been able to find the recharger in the past
  • Phil prepares his breakfast. Closely examined, even this apparently mundane activity reveals a complex web of conditional behavior and interlocking goal - subgoal relationships: walking to the cupboard, opening it, selecting a cereal box, then reaching for, grasping, and retrieving the box. Other complexes, tuned, interactive sequences of behavior are required to obtain a bowl, spoon, and milk carton. Each step involves a series of movements to obtain information and to guide reaching and locomotion. Rapid judgments are continually made about how to carry the objects or whether it is better to ferry some of them to the dining table before obtaining others. Each step is guided by goals, such as grasping a spoon or getting to the refrigerator and is in service of other goals, such as having the spoon to eat with once the cereal is prepared and ultimately obtaining nourishment. Whether he is aware of it or not, Phil is accessing information about the state of his body that determines his nutritional needs, level of hunger, and food preferences.
  • All of these examples involve interaction between an agent with its environment. The agent observes the environment and decides to perform an action. Based on the agent's action, they receive a reward from the environment with an updated state of the environment itself.


The primary way of delivering the lectures would be through in-class sessions. Additionally, a discord channel would also be in place for a live broadcast of the in-class session and discussions/Q&A.


The course consists of lectures that cover concepts from the following topics. A lecture would cover properties from one or more topics. These lectures would be conducted at regular intervals.

  • Topic 1: Reinforcement Learning - an introduction
  • Topic 2: Course Materials, Supplementary Resources, and Development Environment
  • Topic 3: Tabular Methods
  • Topic 4: Dynamic Programming
  • Topic 5: Monte-Carlo & Temporal Difference and Q-Learning
  • Topic 6: Policy Gradients
  • Topic 7: The Actor-Critic Method
  • Topic 8: Deep Q-Network - an Overview
  • Topic 9: Further Exploration

Erwartete Vorkenntnisse

  • Python Programming
  • Statistics


The course would closely follow the Deep reinforcement learning in action from Manning publications > Zai, Alexander, and Brandon Brown. Deep reinforcement learning in action. Manning Publications, 2020.

Additionally, we would also refer to > Reinforcement learning: An introduction from MIT press Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018. [OPTIONAL]


Im Fall von online durchgeführten Prüfungen sind die Standards zu beachten, die die technischen Geräte der Studierenden erfüllen müssen, um an diesen Prüfungen teilnehmen zu können.


Project work and Oral Exam 


The complete curriculum taught throughout the course duration. Students would be challenged with a project and would have upto 3 weeks to solve. The evaluation of the project would be carried our through an oral exam that would require demostration of knowledge aquired in the course and skills in solving the project work.  


The final assesment would be based on the oral exam. The following criteria would be evaluated

- Project code, installation guide, documentation

- trained models

- understanding of concepts taught in the course

- ability to use and explain the concepts used in their project


Note Benotungsschema

Position im Curriculum

  • Masterstudium Angewandte Informatik (SKZ: 911, Version: 13W.1)
    • Fach: Distributed Multimedia Systems (Wahlfach)
      • Selected Topics in Distributed Multimedia Systems ( 2.0h VK / 4.0 ECTS)
        • 623.714 Selected Topics in Distributed Multimedia Systems: Hands-on Reinforcement Learning (2.0h VC / 4.0 ECTS)
  • Masterstudium Informatics (SKZ: 911, Version: 19W.2)
    • Fach: Distributed Systems (Wahlfach)
      • Weitere LVen aus dem gewählten Spezialisierungsfach ( 0.0h XX / 12.0 ECTS)
        • 623.714 Selected Topics in Distributed Multimedia Systems: Hands-on Reinforcement Learning (2.0h VC / 4.0 ECTS)
          Absolvierung im 1., 2. Semester empfohlen
  • Doktoratsstudium Doktoratsstudium der Technischen Wissenschaften (SKZ: 786, Version: 12W.4)
    • Fach: Studienleistungen gem. § 3 Abs. 2a des Curriculums (Pflichtfach)
      • Studienleistungen gem. § 3 Abs. 2a des Curriculums ( 16.0h XX / 32.0 ECTS)
        • 623.714 Selected Topics in Distributed Multimedia Systems: Hands-on Reinforcement Learning (2.0h VC / 4.0 ECTS)

Gleichwertige Lehrveranstaltungen im Sinne der Prüfungsantrittszählung

Diese Lehrveranstaltung ist keiner Kette zugeordnet