14038 - Computing at Scale in Machine Learning: Distributed Computing and Algorithmic Approaches Modulübersicht

Module Number: 14038
Module Title:Computing at Scale in Machine Learning: Distributed Computing and Algorithmic Approaches
  Computing-at-Scale im Maschinellen Lernen: Verteiltes Rechnen und Algorithmische Ansätze
Department: Faculty GW - Faculty of Health Sciences Brandenburg
Responsible Staff Member:
  • Prof. Dr. rer. nat. Schliep, Alexander
Language of Teaching / Examination:English
Duration:1 semester
Frequency of Offer: Every winter semester
Credits: 6
Learning Outcome:After successfully completing the module, students have an overview on how to solve large-scale computational problems in data science and machine learning. They know parallel approaches from multi-threaded computation on individual machines to implicit parallelism frameworks on compute clusters. They are familiar with algorithms and data structures supporting efficient exact or approximate (e.g. sketching) computation with massive data sets in and out of core. They are able to implement the algorithms. They can assess which methods can be used in a given situation.
Contents:The focus will be on the following areas:
  • A review of memory-compute co-location and its impact on big data computations.
  • Solving Machine Learning (ML) work loads using explicit parallelism, specifically multi-threaded computation on an individual machine.
  • Introduction of implicit parallelism programming models as implemented for example in MapReduce, Spark and Ray and their application in ML.
  • Sketching algorithms (e.g. CountMinSketch, HyperLogLog) or Bloom filters.
  • Implementing ML methods using index data structures such as suffix or kd-trees.
Recommended Prerequisites:
  • Knowledge of probability, algorithms and data structures at the undergraduate level 
  • Introduction to machine learning at Master’s level
  • Advanced  knowledge of programming in Python and the Linux command line
Mandatory Prerequisites:None
Forms of Teaching and Proportion:
  • Lecture / 2 Hours per Week per Semester
  • Exercise / 2 Hours per Week per Semester
  • Study project / 30 Hours
  • Self organised studies / 90 Hours
Teaching Materials and Literature:
  • Data Science Design Manual. S. Skiena. Springer  (Exerpts)
  • Parallel Programming for Multicore and Cluster Systems. T. Rauber and G. Rünger. Springer (Exerpts)
  • Review and Original Research Articles
Module Examination:Prerequisite + Final Module Examination (MAP)
Assessment Mode for Module Examination:Prerequisite:
  • passed exercises including project (75%)
Final Module Examination:
  • Written examination, 120 min. OR
  • Oral examination, 30-45 min.
In the first lecture it will be announced whether the examination will organized in written or oral form.
Evaluation of Module Examination:Performance Verification – graded
Limited Number of Participants:None
Part of the Study Programme:
  • Master (research-oriented) / Artificial Intelligence / PO 2022
  • Master (research-oriented) / Informatik / PO 2008
Remarks:
Module Components:
  • Lecture: Computing at Scale in Machine Learning: Distributed computing and algorithmic approaches
  • Accompanying exercise
  • Related examination


Components to be offered in the Current Semester:
  • no assignment