AI for the design of oligonucleotide therapeutics

In particular, for antisense oligonucleotides (ASO) that act by RNAse H1-mediated knockdown, binding energies and kinetics of ASO mRNA duplexes are critical for predicting efficacy and safety. We predict binding energies from sequences, study the kinetics of ASO action to make the ASO drug design process more predictable, and combine molecular dynamics and artificial intelligence in collaborative projects to extend predictive models to a wider range of nucleotide modifications. Our federated, privacy-preserving learning approach enables competitors to pool data for training predictors of binding energies.

ML for pan-genomic graphs

Pan-genomic graphs provide a principled approach for dealing with structural variants and the high degree of diversity between genomes. ML on pan-genomic graphs will allow to tackle prediction and regression tasks for different populations, including quantities relevant for oligonucleotide therapeutics, such as transcription rate or accessibility, as well as clinically relevant variables.

ML and algorithmics for sequencing data

Data generated by high-throughput experimental platforms such as high-throughput sequencing (HTS) pose computational challenges, in particular when advanced statistical approaches such as Bayesian methods are used for analysis. In the past, we have developed a compressive genomics approach funded by the NIH Big Data to Knowledge Program (BD2K), used wavelet compression in Bayesian HMMs for copy number variant detection, and significantly improved the utility of statistical ML models representing genomes - variable length Markov chains - through faster learning algorithms. This enables, for example, alignment-free genome comparisons from raw data.

Education

Computational thinking is a basic requirement for all disciplines. The teaching of computational and algorithmic ideas can benefit greatly from software tools. We develop animation systems for graph algorithms that are available on desktop, as a web app, and soon as an iOS app; CATBox is a Springer textbook that uses Gato. With our Hidden Markov Model library, learners can focus on solving exciting bioinformatics problems.