Wintersemester 2025-26

Oberseminar Medizinische Bioinformatik
meetingsTuesdays13:45 - 15:15SD, LG 9, 9.317
Research Module in Artifial Intelligence
meetingsTuesdays16:00 - 17:30SD, LG 9, 9.317
Computing at Scale in Machine Learning: Distributed computing and algorithmic approaches
lectureThursdays15:30 - 17:00ZC, LG 1A, HS 1
exerciseThursdays17:30 - 19:00ZC, ZHG, SR 1
ZC, ZHG, SR 2
examinationFriday (2026-03-27)11:00 - 13:00ZC, ZHG, AM 2
Introduction to Bioinformatics
exerciseFridays9:15 - 10:45SD, LG 9, SR 9.219
lectureFridays11:30 - 13:00SD, LG 9, SR 9.219
examinationTuesday (2026-03-24)11:00 - 13:00SD, LG 15, HS 15 V 110
Bioinformatics
seminarFridays13:45 - 15:15SD, LG 9, SR 9.219

Oberseminar Medizinische Bioinformatik (Modul: 13600)

Detailed information for participants is available at https://www.b-tu.de/elearning/btu/course/view.php?id=15470.

Meetings
Tuesdays, 13:45 - 15:15
Sachsendorf Campus, LG 9, room 9.317

Research Module in Artificial Intelligence (Modul: 14060)

The research module helps students to prepare for a Master's thesis and focusses on soft skills including literature search, writing, planning research activities and architecting software implementations.  The topical focus is on medical bioinformatics, particularly with using Machine learning and AI methods in the life sciences, but the research module is not constrained to the field.  The format are weekly meetings in which students 

  • develop a research idea by
  • read an original research paper,
  • prepare a project proposal,
  • perform some  preliminary analysis (e.g. implementing a baseline approach) to familiarize themselves with tools and data, and
  • write a project plan for a possible Masters thesis project. 

The writing process, the formulation of scientific questions and the design to address them are important initial steps in the Master's process.

About the field:

Molecular biology and biomedicine are scientific fields which have undergone dramatic changes over the last four decades driven by novel scientific instruments such as sequencing machines allowing us to study genomes as well as gene activity. This has enabled a much deeper understanding of the molecular mechanism of life, evolution of species, and, very importantly, a better understanding of human disease.
A large part of the change was fueled by methods from Computer Science, most famously algorithms for comparing biological sequences (also known as approximate string matching) and for combining or assembling short sequences into complete genomes. As a matter of fact for large parts of bioinformatics the representation needed of biology is that of a string over the alphabet {A, C, G, T}.  

Consequently, by keeping the biological background quite abstract  research on state-of-the-art methods in bioinformatics is easily accessible also to students with a limited natural sciences or biology background. 

Scientific Area:

The projects in this version of the module should focus on research relevant to the department for Medical Bioinformatics with a focus on patient-data acquisition. This includes topics such as analysis of (multi-)omics data in particular from genomics and transcriptomics studies, analysis of clinical and medical sensor time-series data and AI methods for computational drug design. A main methodological focus is on modality specific processing of information (e.g., predicting safety or efficacy of drugs, identifying changes of a cell’s state from gene expression level) and knowledge acquisition (such as in dynamic event classification in patient data or data fusion of omics data to infer disease mechanisms). Due to the large amounts of data available from experimental collaborators and public sources, machine learning approaches often have non-trivial algorithmic aspects and/or use parallel computation (for example federated learning including privacy guarantees).

 

Example Projects Proposals: Related topics applying methods from Computing at Scale are also possible

1. Design of Oligonucleotide Drugs — Deep Learning for predicting statistics of noise in kinetic simulations.

Kinetic simulations of the action of oligonucleotide drugs allow to estimate differences between drug candidates. While the steady-state average effect of an oligonucleotide drug can be solved for explicitly, the variation, or noise, cannot. When one screens large number of oligonucleotide drug candidates, running stochastic simulations to estimate the noise becomes expensive. The goal of the project is to develop a deep learning model which can predict the noise distribution based on kinetic parameters of oligonucleotide drug candidates.

2. Determinantal Point Process (DPP) for selecting diverse set of k-mers.

A k-mer is a short DNA-sequence. Genomes contain roughly as many different k-mers as the length of the genome. The goal of the project is to develop and implement an approach to select a diverse set of k-mers from a genome, both in an unsupervised setting (i.e., 'give me 10000 k-mers which are maximally diverse') and a semi-supervised setting (i.e., 'in addition to these 1000 k-mers I chose, give me another 9000 so that the combined set is maximally diverse'). DPPs have been used for selecting diverse queries in search results; the specific project is useful for genomic and drug design studies, e.g. for designing experiments and for acceleration of computational pipelines. 

3. Design of Oligonucleotide Drugs — data needs in transfer learning of binding energies for chemically modified Oligonucleotide Drugs.

Existing deep learning approaches can learn to predict binding energies for Oligonucleotides, which are a main determinant of the efficacy of the drug. Sufficiently large data is only available for natural DNA molecules, but not for the chemically modified oligonucleotide drug molecules. The natural approach to deal with a lack of data in such settings is to use transfer learning — training a deep learning model on the natural DNA molecules and adapting to the chemically modified oligonucleotide drug molecules with further training.  The goal of the project is to simulate the changes in binding energies and determine how much data is needed from the chemically modified oligonucleotide drug molecules.

4. Determinantal Point Process (DPP) for initial model selection of Hidden Markov Models (HMM) for clustering time-series data.

Mixtures of HMM have been successfully used for modeling time-series also in clinical (e.g. drug response), or molecular (e.g. gene expression) setting, where there is a lot of noise in the data, the data is heavily under sampled, and a qualitative look at the data was beneficial. One difficulty is to identify an appropriate initializing. DPP allow to select a diverse set of time-series which can serve as such an initial set of qualitative behaviors. Additional uses are acceleration of computational pipelines and, in combination with k-nearest neighbor classifiers, ML methods for outlier detection (one-class classification). 

5. Variable Length Markov Chains (VLMC) for single-cell RNAseq Data.

VLMC have been used for classifying pathogen genomes, and a range of other tasks from genomics and natural-language processing.  Common to these applications is that robustness of VLMC to noise, their concentration of the robust statistical properties of sequences has a desirable effect. With RNA-sequencing the activity of all genes can be measured;  single-cell RNASeq resolves the gene expression at the level of individual cells. An important task is to determine the number of different cell types present by clustering. The goal of the proposal is to evaluate whether VLMC are suitable for analyzing scRNASeq data.

Detailed information for participants is available at https://www.b-tu.de/elearning/btu/course/view.php?id=15475.

Meetings
Tuesdays, 16:00 - 17:30
Sachsendorf Campus, LG 9, room 9.317

Computing at Scale in Machine Learning: Distributed computing and algorithmic approaches (Modul: 14038)


Learning Outcomes:
Students will obtain an overview on how to solve large-scale computational problems in data science and machine learning using a) parallel approaches from multi-threaded computation on individual machines to implicit parallelism frameworks on compute clusters and b) algorithms and data structures supporting efficient exact or approximate computation with massive data sets in and out of core. In particular they will learn how to analyze relevant probabilistic data structures and algorithms and select and implement appropriate computational approaches for large-scale problems.

Contents:
The focus will be on the following areas:

  • A review of memory-compute co-location and its impact on big data computations.
  • Solving Machine Learning (ML) work loads using explicit parallelism, specifically multi-threaded computation on an individual machine.
  • Introduction of implicit parallelism programming models as implemented for example in MapReduce, Spark and Ray and their application in ML.
  • Probabilistic algorithms such as sketching algorithms (incl. CountMinSketch, HyperLogLog) or Bloom filters.
  • Implementing ML methods using index data structures such as suffix or kd-trees.

Recommended Prerequisites:
Introduction to machine learning at Master’s level. Advanced knowledge of programming in Python and the Linux command line. 

Detailed information for participants is available at https://www.b-tu.de/elearning/btu/course/view.php?id=15473.

LectureExercise
Thursdays, 15:30 - 17:00Thursdays, 17:30 - 19:00
Central Campus, LG 1A, HS 1
  • Group 1: Room ZHG/SR1 - for Assignment Groups A[A-Z]
  • Group 2: Room ZHG/SR2 - for Assignment Groups A-Z

Introduction to Bioinformatics (Modul: 14336)

Learning Outcomes:
After successfully completing the module, students will have acquired an overview of the fundamentals of bioinformatics. This includes an introduction to relevant molecular processes, scientific instruments to investigate these processes, and the data generated by them. For central computational problems, students will be able to discuss advantages and disadvantages of statistical and basic algorithmic approaches, respectively adapt them to specific biological questions. Students will be able to analyze specific biological data using appropriate software libraries for Python.

Contents:
The focus will be on the basics of the following areas:

  • An introduction to molecular biology including relevant scientific instruments and the Omics-data generated by them.
  • Pair-wise and multiple sequence alignments, seed-and-extend approaches, and genome indexes
  • Evolutionary models and phylogenetic trees
  • Signals in sequences: identification of motifs
  • Assembly of genomes and transcriptomes
  • Gene expression analysis

Recommended Prerequisites:
Good knowledge of discrete probability, algorithms and data structures at the undergraduate level. Advanced knowledge of programming in Python and the Linux command line. 

Detailed information for participants is available at https://www.b-tu.de/elearning/btu/course/view.php?id=15472.

ExerciseLecture
Fridays, 9:15 - 10:45Fridays, 11:30 - 13:00
Sachsendorf Campus, LG 9, SR 9.219Sachsendorf Campus, LG 9, SR 9.219

Bioinformatics (Modul: 14810)

After successfully completing the module, students will be familiar with state-of-the-art problems and methodological approaches used in medical bioinformatics. They will have the ability to familiarize themselves with current research in medical bioinformatics from original research literature, to participte in a technical discussion within the context of international science and present scientific content in written and oral form.

Students will learn about specific state-of-the-art problems and methodological approaches used in medical bioinformatics. The applications will range from diagnostics and monitoring patients with sensor, clinical and omics data, to detect clinically relevant states or understand cellular processes relevant to diagnosis and disease as well mechanisms for treating diseases. Methods will include both algorithmic and machine learning approaches. 

Workflow for presenting 
  1. Work through the article reviewing theory as needed. 
  2. Schedule a consultation roughly half-way before your presentation to discuss topical questions and the outline of presentation. The expectation is that you have understood the paper, can answer questions, and have selected material from the paper for presentation. In particular you should have an outline in writing (e.g. slide stack with titles). The purpose of the consultation is not to explain the paper to you, it is not a substitute for doing your part in step 1). The purpose is to give you an opportunity to work through the difficult bits of the paper with our help.
  3. Hold a trial presentation for 1-2 peers. Peers don't have to be in the seminar.
  4. Schedule a meeting for approval of finished presentation to be held two weeks before the seminar presentation. 

Detailed information for participants is available at https://www.b-tu.de/elearning/btu/course/view.php?id=15474

Seminar
Fridays , 13:45 - 15:15
Sachsendorf Campus, LG 9, SR 9.219