Dissertation (Ph.D.) - Indiana University, School of Informatics, Computing, and Engineering, 2019
Bayesian analysis exhibits two kinds of modularity. First, it is composed of conceptually separate steps: modeling and inference. Second, inference is itself composed of two separate steps: conditioning and querying. Probabilistic programming systems help automate Bayesian inference: they realize the first kind of modularity, and provide a modeling language decoupled from a suite of general-purpose inference tools. Most probabilistic programming systems, however, fail to implement inference in a modular fashion. Inference tools in such systems usually combine the conditioning step with querying algorithms (such as Markov Chain Monte Carlo sampling). It is difficult to formally specify such tools, making it hard to verify their correctness. A small change to the modeling language might require a disproportionately large change across the inference suite.
In this work we specify, implement, and verify a tool that transforms a large class of probabilistic programs to perform the step of conditioning alone. We provide a formal specification for this tool – based on the measure-theoretic notion of disintegration – and verify the correctness of the implementation with respect to this specification. The hard part lies in designing the tool to be verifiable and reusable across different applications. This work alleviates some of that difficulty by addressing two use cases: scaling conditioning to large arrays of observations, and
conditioning mixtures of continuous and discrete distributions. Furthermore, we demonstrate
that querying tools based on MCMC samplers can themselves be defined using this conditioning tool, thereby increasing its reusability.