High-dimensional statistical learning: New methods to advance economic and sustainability policy (2019-2023)

ZK35 (2019-2023)

In this project, we aim to investigate how the largely separate research streams of Bayesian econometrics, statistical model checking, and machine learning can be combined and integrated to create innovative and powerful tools for the analysis of big data in the social sciences.

Recent years have seen a tremendous surge in the availability of socioeconomic data characterized by vast complexity and high dimensionality. However, prevalent methods employed to inform practitioners and policy makers are still focused on small to medium-scale datasets. Consequently, crucial transmission channels are easily overlooked and the corresponding inference often suffers from omitted variable bias. This calls for novel methods which enable researchers to fully exploit the ever increasing amount of data.

In this project, we aim to investigate how the largely separate research streams of Bayesian econometrics, statistical model checking, and machine learning can be combined and integrated to create innovative and powerful tools for the analysis of big data in the social sciences. Thereby, we pay special attention to properly incorporating relevant sources of uncertainty. Albeit crucial for thorough empirical analyses, this aspect is often overlooked in traditional machine learning techniques, which have mainly been centered on producing point forecasts for key quantities of interest only. In contrast, Bayesian statistics and econometrics are based on designing algorithms to carry out exact posterior inference which in turn allows for density forecasts.

Our contributions are twofold: From a methodological perspective, we develop cutting-edge methods that enable fully probabilistic inference of dynamic models in vast dimensions. In terms of empirical advances, we apply these methods to highly complex datasets that comprise situations where either the number of observations, the number of potential time series and/or the number of variables included is large. More specifically, empirical applications will center on four topical issues in the realm of sustainable development and socioeconomic policy: income inequality, economic growth and climate change, cryptocurrencies, and urban mobility. In these applications, we focus on probabilistic forecasting using real-time data to perform model validation in an efficient way. Moreover, we address structural inference. As policy makers are typically interested in evaluating their policies quantitatively, robust econometric tools are crucial for counterfactual simulations. In light of the increasing complexity of the economy, however, large information sets need to be exploited to appropriately recover the underlying causal structures and provide a rich picture of potential transmission channels of policy interventions. The team constitutes a genuinely collaborative partnership of five young high-potential researchers composed of statisticians, machine learning experts, macro-, ecological and regional economists as well as social and computer scientists. Together, the group has the methodological, empirical, and theoretical expertise required for this project.