I am working on discovering causal relationships automatically in complex environments, with theoretical guarantees. In particular, I, together with my cooperators, are exploring causal discovery in the presence of distribution shifts, selection bias, and confounders. We aim to develop methods to learn causal relations in an automatic way, with no hard restrictions on data distributions, causal mechanisms, or data types.
Besides theoretical implementation in the field of causal discovery, I am also actively exploring its practical use. We have applied our developed algorithms to neuroscience, biology, finance, and archeology. For example, we have developed methods to identify atypical brain information flow in autistics, with which to distinguish autistics from typical controls.
In addition, we try to answer: how does causal understanding contribute to artificial general intelligence? The causal view provides a clear picture for understanding advanced learning problems and allowing going beyond the data in a principled, interpretable manner. More specifically, I have been trying to answer how causal understanding facilitates solving fundamental machine learning problems, including classification, clustering, forecasting in nonstationary environments, reinforcement learning, transfer learning, and representation learning, especially when there exist distribution shifts or hidden variables leading to spurious associations.
Causal Discovery from Nonstationary/Heterogeneous Data
Biwei Huang, Kun Zhang, Jiji Zhang, Joseph Ramsey, Bernhard Schölkopf, Clark Glymour
It is commonplace to encounter heterogeneous or nonstationary data, of which the underlying generating process changes across domains or over time. Such a distribution shift feature presents both challenges and opportunities for causal discovery. In this work, we develop a framework for causal discovery from such data, called Constraint-based causal Discovery from heterogeneous/NOnstationary Data (CD-NOD), to find causal skeleton and directions and estimate the properties of mechanism changes. [paper1][paper2][paper3][ code ][poster]
Causal Discovery and Forecasting in Nonstationary Environments with State-Space Models
Biwei Huang, Kun Zhang, Mingming Gong, Clark Glymour
In this work, we study causal discovery and forecasting for nonstationary time series. We provide a principled investigation of how causal discovery benefits from nonstationarity and how the learned causal knowledge facilitates forecasting. Particularly, we formalize causal discovery and forecasting under the framework of nonlinear state-space models. [paper] [ code ][poster]
Specific and Shared Causal Relation Modeling and Mechanism-Based Clustering
Biwei Huang, Kun Zhang, Pengtao Xie, Mingming Gong, Eric Xing, Clark Glymour
In this work, we develop a unified framework for causal discovery and mechanism-based group identification. In particular, we propose a specific and shared causal model (SSCM), which takes into account the variabilities of causal relations across individuals/groups and leverages their commonalities to achieve statistically reliable estimation. The learned SSCM gives the specific causal knowledge for each individual as well as the general trend over the population. In addition, the estimated model directly provides the group information of each individual. [paper] [ code ][poster]
Generalized Score Functions for Causal Discovery
Biwei Huang, Kun Zhang, Yizhu Lin, Bernhard Schölkopf, Clark Glymour
Current score-based methods usually need strong assumptions on functional forms of causal mechanisms, as well as on data distributions. In this work, we introduce generalized score functions for causal discovery based on the characterization of general (conditional) independence relationships between random variables. The resulting causal discovery approach produces asymptotically correct results in rather general cases, which may have nonlinear causal mechanisms, a wide class of data distributions, mixed continuous and discrete data, and multidimensional variables. [paper][ code ][poster]
Multi-Domain Causal Structure Learning in Linear Systems
AmirEmad Ghassami, Negar Kiyavash, Biwei Huang, Kun Zhang
In this work, we study the problem of causal structure learning in linear systems from observational data given in multiple domains, across which the causal coefficients and/or the distribution of the exogenous noises may vary. The main tool used in our approach is the principle that in a causally sufficient system, the causal modules, as well as their included parameters, change independently across domains. [paper]
Identification of Time-Dependent Causal Model: A Gaussian Process Treatment
Biwei Huang, Kun Zhang, Bernhard Schölkopf
In this work, we present a novel approach to modeling time-dependent causal influences. We show that by introducing time information as a common cause for the observed processes, we can model the time-varying causal influences between the observed processes, as well as the influence from a certain type of unobserved confounders. We propose a principled way for the estimation by extending Gaussian Process regression, which enables an automatic way to learn how the causal model changes over time. [paper][ code ][poster]
On the Identifiability and Estimation of Functional Causal Models in the Presence of Outcome-Dependent Selection
Kun Zhang, Jiji Zhang, Biwei Huang, Bernhard Schölkopf, Clark Glymour
In this work, we study the identifiability and estimation of functional causal models under selection bias, with a focus on the situation where the selection depends solely on the effect variable. We address two questions of identifiability: the identifiability of the causal direction between two variables in the presence of selection bias, and, given the causal direction, the identifiability of the model with outcome-dependent selection. We also propose two methods for estimating an additive noise model from data that are generated with outcome-dependent selection. [paper]