This work is on generalizing the learned policies to new environments in an efficient and effective way. Usually, distribution shifts are due to the changes of only a few mechanisms in the generative process, so we can just adapt the distribution corresponding to the generating process of a small portion of the variables. To this end, we leverage a graphical representation that characterizes structural relationships over variables in the RL system. Such graphical representations provide a compact way to encode where the changes are and inform us with a minimal set of changes that one needs to consider for policy adaptation, so that we can efficiently adapt the policy to the target domain, where only a few samples are needed and further policy optimization is avoided. [paper][ code ]

Dealing with non-stationarity in environments (e.g., in the transition dynamics) and objectives (e.g., in the reward functions) is a challenging problem that is crucial in real-world applications of reinforcement learning (RL). While most current approaches model the changes as a single shared embedding vector, we leverage insights from the recent causality literature to model non-stationarity in terms of individual latent change factors, and causal graphs across different environments. In particular, we propose Factored Adaptation for Non-Stationary RL (FANS-RL), a factored adaption approach that learns jointly both the causal structure in terms of a factored MDP, and a factored representation of the individual time-varying change factors. We prove that under standard assumptions, we can completely recover the causal graph representing the factored transition and reward function, as well as a partial structure between the individual change factors and the state components. Through our general framework, we can consider general non-stationary scenarios with different function types and changing frequency, including changes across episodes and within episodes. [pdf]

This paper is concerned with data-driven unsupervised domain adaptation, where it is unknown in advance how the joint distribution changes across domains, i.e., what factors or modules of the data distribution remain invariant or change across domains. To develop an automated way of domain adaptation with multiple source domains, we propose to use a graphical model as a compact way to encode the change property of the joint distribution, which can be learned from data, and then view domain adaptation as a problem of Bayesian inference on the graphical models. Such a graphical model distinguishes between constant and varied modules of the distribution and specifies the properties of the changes across domains, which serves as prior knowledge of the changing modules for the purpose of deriving the posterior of the target variable Y in the target domain. This provides an end-to-end framework of domain adaptation, in which additional knowledge about how the joint distribution changes, if available, can be directly incorporated to improve the graphical representation. We discuss how causality-based domain adaptation can be put under this umbrella. [pdf]

Given nonstationary processes where the causal relations may change over time, how can we discover the time-varying causal relationships, and meanwhile predict future values of variables of interest in a principled way? In this work, we show that causal discovery and forecasting for nonstationary processes can be put under the same umbrella. Particularly, by exploiting a particular type of state-space model that represents the processes, we find that nonstationarity actually helps to identify causal structure and that forecasting naturally benefits from the learned causal representation. Moreover, given the causal model, we can directly treat forecasting as a problem in Bayesian inference that exploits the time-varying property of the data and adapts to new observations in a principled manner. [paper] [ code ][poster]