Causal Forests for Heterogeneous Treatment Effects
MLA Citation
Summary
This influential paper by Athey and Imbens introduces "causal forests," a machine learning method for estimating heterogeneous treatment effects. The authors address a fundamental question in policy evaluation and personalized medicine: not just whether a treatment works on average, but for whom it works best.
Traditional methods like linear regression often assume constant treatment effects or rely on pre-specified interaction terms. Causal forests, in contrast, automatically discover subgroups with different treatment effects by recursively partitioning the covariate space. The method builds on random forests but modifies the splitting criterion to maximize heterogeneity in treatment effects rather than prediction accuracy.
The key innovation is an "honest" estimation approach: the algorithm uses different subsets of data for constructing the tree structure and estimating treatment effects within leaves. This prevents overfitting and provides valid confidence intervals—a crucial advance over black-box machine learning methods.
The authors demonstrate their method through simulations and an application to a job training program (the National Supported Work Demonstration). They show that causal forests can identify subgroups with substantially different responses to treatment, such as younger participants benefiting more from the program than older ones. The method also outperforms alternatives like regression with interactions or conventional random forests.
This paper bridges the gap between causal inference and machine learning, providing a rigorous framework for data-driven discovery of treatment effect heterogeneity. It has spawned a rich literature on machine learning methods for causal inference and influenced practice in economics, healthcare, and marketing.
Key Contributions
-
Introduces causal forests for estimating heterogeneous treatment effects using machine learning
-
Modifies random forest splitting criterion to maximize treatment effect heterogeneity rather than prediction accuracy
-
Develops "honest" estimation approach using separate data subsets for tree construction and effect estimation
-
Provides valid confidence intervals for treatment effect estimates, addressing inference in machine learning
-
Demonstrates application to job training program data, identifying subgroups with different treatment responses
-
Bridges causal inference and machine learning with rigorous statistical framework