Wintersemester 2024-25

Computing at Scale in Machine Learning: Distributed computing and algorithmic approaches (Modul: 14038)


Learning Outcomes:
Students will obtain an overview on how to solve large-scale computational problems in data science and machine learning using a) parallel approaches from multi-threaded computation on individual machines to implicit parallelism frameworks on compute clusters and b) algorithms and data structures supporting efficient exact or approximate computation with massive data sets in and out of core. In particular they will learn how to analyze relevant probabilistic data structures and algorithms and select and implement appropriate computational approaches for large-scale problems.

Contents:
The focus will be on the following areas:

  • A review of memory-compute co-location and its impact on big data computations.
  • Solving Machine Learning (ML) work loads using explicit parallelism, specifically multi-threaded computation on an individual machine.
  • Introduction of implicit parallelism programming models as implemented for example in MapReduce, Spark and Ray and their application in ML.
  • Probabilistic algorithms such as sketching algorithms (incl. CountMinSketch, HyperLogLog) or Bloom filters.
  • Implementing ML methods using index data structures such as suffix or kd-trees.

Recommended Prerequisites:
Introduction to machine learning at Master’s level. Advanced knowledge of programming in Python and the Linux command line. 

LectureExercise
Thursday (2024-10-17 - 2025-02-06), 15:30 - 17:00Thursday (2024-10-17 - 2025-02-06), 17:30 - 19:00
Central Campus, LG 1A, HS 1Central Campus, LG 1A, HS 1

Introduction to Bioinformatics (Modul: )

Learning Outcomes:
After successfully completing the module, students will have acquired an overview of the fundamentals of bioinformatics. This includes an introduction to relevant molecular processes, scientific instruments to investigate these processes, and the data generated by them. For central computational problems, students will be able to discuss advantages and disadvantages of statistical and basic algorithmic approaches, respectively adapt them to specific biological questions. Students will be able to analyze specific biological data using appropriate software libraries for Python.

Contents:
The focus will be on the basics of the following areas:

  • An introduction to molecular biology including relevant scientific instruments and the Omics-data generated by them.
  • Pair-wise and multiple sequence alignments, seed-and-extend approaches, and genome indexes
  • Evolutionary models and phylogenetic trees
  • Signals in sequences: identification of motifs
  • Assembly of genomes and transcriptomes
  • Gene expression analysis

Recommended Prerequisites:
Good knowledge of discrete probability, algorithms and data structures at the undergraduate level. Advanced knowledge of programming in Python and the Linux command line. 

ExerciseLecture
Friday (2024-10-18 - 2025-02-07), 9:15 - 10:45Friday (2024-10-18 - 2025-02-07), 11:30 - 13:00
Sachsendorf Campus, LG 9, HS 9.122Sachsendorf Campus, LG 9, HS 9.122

Bioinformatics (Modul: 13866)

Seminar
Friday (2024-10-18 - 2025-02-07), 13:45 - 15:15
Sachsendorf Campus, LG 9, HS 9.122

Research Module in Artificial Intelligence (Modul: 14060)

Appointment by arrangement in Sachsendorf


Oberseminar Medizinische Bioinformatik (Modul: 13600)

Appointment by arrangement in Sachsendorf