As machine learning (ML) models are becoming more complex, there has been a growing interest in making use of decentrally generated data (e.g., from smartphones) and in pooling data from many actors. At the same time, however, privacy concerns about organizations collecting data have risen. As an additional challenge, decentrally generated data is often highly heterogeneous, thus breaking assumptions needed by standard ML models. Here, we propose to “kill two birds with one stone” by developing Invariant Federated Learning, a framework for training ML models without directly collecting data, while not only being robust to, but even benefiting from, heterogeneous data. For the problem of learning from distributed data, the Federated Learning (FL) framework has been proposed. Instead of sharing raw data, clients share model updates to help train an ML model on a central server. We combine this idea with the recently proposed Invariant Risk Minimization (IRM) approach, a solution for causal learning. IRM aims to build models that are robust to changes in the data distribution and provide better out-of-distribution (OOD) generalization by using data from different environments during training. This integrates naturally with FL, where each client may be seen as constituting its own environment. We seek to gain robustness to distributional changes and better OOD generalization, as compared to FL methods based on the standard empirical risk minimization. Previous work has further shown that causal models possess better privacy properties than associational models. We will turn these theoretical insights into practical algorithms to, e.g., provide Differential Privacy guarantees for FL. The project proposed here integrates naturally with ideas pursued in the context of the Microsoft Turing Academic Program (MS-TAP), where the PI’s lab is already collaborating with Microsoft (including Emre Kıcıman, a co-author of this proposal) in order to make language models more robust via IRM.