Variational Bayesian Learning and its Applications
This dissertation is devoted to studying a fast and analytic approximation method, called the variational Bayesian (VB) method, and aims to give insight into its general applicability and usefulness, and explore its applications to various real-world problems. This work has three main foci: 1) The general applicability and properties; 2) Diagnostics for VB approximations; 3) Variational applications. Generally, the variational inference has been developed in the context of the exponential family, which is open to further development. First, it usually consider the cases in the context of the conjugate exponential family. Second, the variational inferences are developed only with respect to natural parameters, which are often not the parameters of immediate interest. Moreover, the full factorization, which assumes all terms to be independent of one another, is the most commonly used scheme in the most of the variational applications. We show that VB inferences can be extended to a more general situation. We propose a special parameterization for a parametric family, and also propose a factorization scheme with a more general dependency structure than is traditional in VB. Based on these new frameworks, we develop a variational formalism, in which VB has a fast implementation, and not be limited to the conjugate exponential setting. We also investigate its local convergence property, the effects of choosing different priors, and the effects of choosing different factorization scheme. The essence of the VB method relies on making simplifying assumptions about the posterior dependence of a problem. By definition, the general posterior dependence structure is distorted. In addition, in the various applications, we observe that the posterior variances are often underestimated. We aim to develop diagnostics test to assess VB approximations, and these methods are expected to be quick and easy to use, and to require no sophisticated tuning expertise. We propose three methods to compute the actual posterior covariance matrix by only using the knowledge obtained from VB approximations: 1) To look at the joint posterior distribution and attempt to find an optimal affine transformation that links the VB and true posteriors; 2) Based on a marginal posterior density approximation to work in specific low dimensional directions to estimate true posterior variances and correlations; 3) Based on a stepwise conditional approach, to construct and solve a set of system of equations that lead to estimates of the true posterior variances and correlations. A key computation in the above methods is to calculate a uni-variate marginal or conditional variance. We propose a novel way, called the VB Adjusted Independent Metropolis-Hastings (VBAIMH) method, to compute these quantities. It uses an independent Metropolis-Hastings (IMH) algorithm with proposal distributions configured by VB approximations. The variance of the target distribution is obtained by monitoring the acceptance rate of the generated chain. One major question associated with the VB method is how well the approximations can work. We particularly study the mean structure approximations, and show how it is possible using VB approximations to approach model selection tasks such as determining the dimensionality of a model, or variable selection. We also consider the variational application in Bayesian nonparametric modeling, especially for the Dirichlet process (DP). The posterior inference for DP has been extensively studied in the context of MCMC methods. This work presents a a full variational solution for DP with non-conjugate settings. Our solution uses a truncated stick-breaking representation. We propose an empirical method to determine the number of distinct components in a finite dimensional DP. The posterior predictive distribution for DP is often not available in a closed form. We show how to use the variational techniques to approximate this quantity. As a concrete application study, we work through the VB method on regime-switching lognormal models and present solutions to quantify both the uncertainty in the parameters and model specification. Through a series numerical comparison studies with likelihood based methods and MCMC methods on the simulated and real data sets, we show that the VB method can recover exactly the model structure, gives the reasonable point estimates, and is very computationally efficient.