Module Number:
| 14038
|
Module Title: | Computing at Scale in Machine Learning: Distributed Computing and Algorithmic Approaches |
|
Computing-at-Scale im Maschinellen Lernen: Verteiltes Rechnen und Algorithmische Ansätze
|
Department: |
Faculty GW - Faculty of Health Sciences Brandenburg
|
Responsible Staff Member: | -
Prof. Dr. rer. nat. Schliep, Alexander
|
Language of Teaching / Examination: | English |
Duration: | 1 semester |
Frequency of Offer: |
Every winter semester
|
Credits: |
6
|
Learning Outcome: | After successfully completing the module, students have an overview on how to solve large-scale computational problems in data science and machine learning. They know parallel approaches from multi-threaded computation on individual machines to implicit parallelism frameworks on compute clusters. They are familiar with algorithms and data structures supporting efficient exact or approximate (e.g. sketching) computation with massive data sets in and out of core. They are able to implement the algorithms. They can assess which methods can be used in a given situation. |
Contents: | The focus will be on the following areas:
- A review of memory-compute co-location and its impact on big data computations.
- Solving Machine Learning (ML) work loads using explicit parallelism, specifically multi-threaded computation on an individual machine.
- Introduction of implicit parallelism programming models as implemented for example in MapReduce, Spark and Ray and their application in ML.
- Sketching algorithms (e.g. CountMinSketch, HyperLogLog) or Bloom filters.
- Implementing ML methods using index data structures such as suffix or kd-trees.
|
Recommended Prerequisites: | - Knowledge of probability, algorithms and data structures at the undergraduate level
- Introduction to machine learning at Master’s level
- Advanced knowledge of programming in Python and the Linux command line
|
Mandatory Prerequisites: | None |
Forms of Teaching and Proportion: | -
Lecture
/ 2 Hours per Week per Semester
-
Exercise
/ 2 Hours per Week per Semester
-
Study project
/ 30 Hours
-
Self organised studies
/ 90 Hours
|
Teaching Materials and Literature: | - Data Science Design Manual. S. Skiena. Springer (Exerpts)
- Parallel Programming for Multicore and Cluster Systems. T. Rauber and G. Rünger. Springer (Exerpts)
- Review and Original Research Articles
|
Module Examination: | Prerequisite + Final Module Examination (MAP) |
Assessment Mode for Module Examination: | Prerequisite:
- passed exercises including project (75%)
Final Module Examination:
- Written examination, 120 min. OR
- Oral examination, 30-45 min.
In the first lecture it will be announced whether the examination will organized in written or oral form. |
Evaluation of Module Examination: | Performance Verification – graded |
Limited Number of Participants: | None |
Part of the Study Programme: | -
Abschluss im Ausland /
Artificial Intelligence /
keine PO
-
Master (research-oriented) /
Artificial Intelligence /
PO 2022
-
Master (research-oriented) /
Informatik /
PO 2008
- 2. SÄ 2017
|
Remarks: | |
Module Components: | - Lecture: Computing at Scale in Machine Learning: Distributed computing and algorithmic approaches
- Accompanying exercise
- Related examination
|
Components to be offered in the Current Semester: | |