Directly Learning Tractable Models for Sequential Inference and DecisionMaking
MetadataShow full item record
Probabilistic graphical models such as Bayesian networks and Markov networks provide a general framework to represent multivariate distributions while exploiting conditional independence. Over the years, many approaches have been proposed to learn the structure of those networks. However, even if the resulting network is small, inference may be intractable (e.g., exponential in the size of the network) and practitioners must often resort to approximate inference techniques. Recent work has focused on the development of alternative graphical models such as arithmetic circuits (ACs) and sum-product networks (SPNs) for which inference is guaranteed to be tractable (e.g., linear in the size of the network for SPNs and ACs). This means that the networks learned from data can be directly used for inference without any further approximation. So far, previous work has focused on learning models with only random variables and for a fixed number of variables based on fixed-length data. In this thesis, I present two new probabilistic graphical models: Dynamic Sum-Product Networks (DynamicSPNs) and Decision Sum-Product-Max Networks (DecisionSPMNs), where the former is suitable for problems with sequence data of varying length and the latter is for problems with random, decision, and utility variables. Similar to SPNs and ACs, DynamicSPNs and DecisionSPMNs can be learned directly from data with guaranteed tractable exact inference and decision making in the resulting models. I also present a new online Bayesian discriminative learning algorithm for Selective Sum-Product Networks (SSPNs), which are a special class of SPNs with no latent variables. This new learning algorithm achieves tractability by utilizing a novel idea of mode matching, where the algorithm chooses a tractable distribution that matches the mode of the exact posterior after processing each training instance. This approach lends itself naturally to distributed learning since the data can be divided into subsets based on which partial posteriors are computed by different machines and combined into a single posterior.