Causal Data Science (13715)

Learning Outcome

Students have a basic understanding of data science in the context of the identification of causal relationships. They are familiar with a verbal, and graphical language to communicate about causality, and with key concepts, such as counterfactuals, outcome equivalence, and confounding effects. They know about typical classes of problems that do not allow causal interpretations of observed associations  as well as typical solutions for these problems by means of data analytic and data collection methods. Moreover, students understand the tight interdependency of data analytics and the design of data collection in order to generate high quality evidence and high quality predictions.

Contents

  1. Counterfactuals, Potential Outcomes, Causal Graphs, and typical problems (i.e., omitted relevant variables, measurement error, reverse causality, endogenous selection, endogenous treatment)
  2. Data analytic solutions: control variables, matching, weighting
  3. Data analytic solutions: instrumental variables, selection instruments
  4. Data collection solutions: real experiments
  5. Assumed experiments as mixed solutions: natural experiments, quasi experiments, regression discontinuity
  6. Times series data as mixed solution: diff-in-diff and related methods
  7. Reflections on moderation and mediation analyses respectively structural equation modeling

The module focuses on applications in business and economics, but the underlying theories and methods generalize beyond these fields. The course complements more traditional data science modules that have a stronger focus on the implementation of data scientific algorithms. Tutorials also apply these methods to the analysis of real-world problems and the analysis of simulated and real datasets. Currently, the freely available software [R] is used in the practical parts of the tutorials.

You can find the complete module description here.