home..

Probabilistic Causal Effect Estimation

Why This Matters

Estimating how treatments or interventions influence outcomes over time is at the heart of causal inference—but real-world systems often react differently at different quantiles (e.g. the worst-affected vs. median cases). In my master’s thesis, I introduce a unified “global” framework that marries causal analysis with modern predictive algorithms—allowing us to uncover not just whether an intervention worked, but how its impact varies across the full outcome distribution.


The Big Question

How would treated units have evolved, at each forecast horizon, if they’d never received the treatment?

To answer this, we:

  1. Define causal mechanisms via Directed Acyclic Graphs (DAGs).
Conceptual table
Figure 1: DAG for Synthetic Control Method in the Thesis
  1. State identification assumptions (e.g. no hidden back-doors).
    • Assumption 1 (Consistency).
    • Assumption 2 (Generalized fixed-effects model).
    • Assumption 3 (Existence of synthetic control from classical synthetic control methods)
  2. Run placebo tests to validate the model’s ability to reproduce a “null effect.”
Conceptual table
Figure 2: Quantile Distribution for the Simulated Treated Units
  1. Estimate probabilistic causal effects across quantiles using forecasts.
Conceptual table
Figure 3: Counterfactual Results for Real-world Data from DeepProbCP

From Classical Tools to a Global Paradigm

Conceptual table
Figure 4: Dataset Structure for 911 Emergency Call Dataset

Models Compared

  1. CausalImpact (local, parametric Bayesian)
  2. TSMixer (MLP-based global)
  3. DeepProbCP (LSTM-based global with quantile forecasting)
  4. Temporal Fusion Transformer (TFT) (attention-based global)

How We Measure Success


Key Findings

  1. Synthetic Data Experiments
    • For small, linear series: CausalImpact wins on point metrics.
    • As series grow in number or complexity: TFT consistently outperforms others, capturing nonlinear dynamics more faithfully.
    • DeepProbCP and TSMixer show mixed results—DeepProbCP yields useful probabilistic intervals but occasionally lags in raw accuracy.
  2. Real-World Case: 911 Emergency Calls & COVID-19
    • Lockdown measures induced a negative treatment effect on call volume—people simply called less.
    • DeepProbCP and TFT both pass the placebo test and deliver lower CRPS on the control series, with TFT slightly edging out on point accuracy.
    • Heterogeneous effects across quantiles (e.g. 10th vs. 90th percentile) reveal that the strongest impact fell on the highest-demand counties.
    Conceptual table
    Figure 4: Estimation of the Average Treatment Effect on the Treated per Quantile

Take-Home Messages


What’s Next?

By blending deep-learning forecasts with rigorous causal checks, we open a path toward fine-grained, distributional causal insights—vital for policy, medicine, and any domain where how an intervention moves the needle matters as much as whether it does.


Happy modeling—and may your counterfactuals be ever informative!

© 2025 Dingyi Lai   •  Powered by Soopr   •  Theme  Moonwalk