tdhopper + @bayes   45

Topic Modeling Bibliography
"My peronal bibliography for topic modeling, with some notes about how the paper is useful to me."
@bayes 
may 2015 by tdhopper
Hierarchical Dirichlet Process (with Split-Merge Operations)
C++ implementation of Chong Wang's "Hierarchical Dirichlet Process (with Split-Merge)" algorithm.
@bayes 
may 2015 by tdhopper
On Markov chain Monte Carlo methods for tall data
Markov chain Monte Carlo methods are often deemed too computationally intensive to be of any practical use for big data applications, and in particular for inference on datasets containing a large number n of individual data points, also known as tall datasets. In scenarios where data are assumed independent, various approaches to scale up the Metropolis-Hastings algorithm in a Bayesian inference context have been recently proposed in machine learning and computational statistics. These approaches can be grouped into two categories: divide-and-conquer approaches and, subsampling-based algorithms. The aims of this article are as follows. First, we present a comprehensive review of the existing literature, commenting on the underlying assumptions and theoretical guarantees of each method. Second, by leveraging our understanding of these limitations, we propose an original subsampling-based approach which samples from a distribution provably close to the posterior distribution of interest, yet can require less than O(n) data point likelihood evaluations at each iteration for certain statistical models in favourable scenarios. Finally, we have only been able so far to propose subsampling-based methods which display good performance in scenarios where the Bernstein-von Mises approximation of the target posterior distribution is excellent. It remains an open challenge to develop such methods in scenarios where the Bernstein-von Mises approximation is poor.
@bayes 
may 2015 by tdhopper
Collapsed Gibbs
Helpful, intuitive explanation of what collapsing a Gibbs sampler does.
@bayes 
may 2015 by tdhopper
A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process
Abstract The hierarchical Dirichlet process (HDP) has be- come an important Bayesian nonparametric model for grouped data, such as document collections. The HDP is used to con- struct a flexible mixed-membership model where the num- ber of components is determined by the data. As for most Bayesian nonparametric models, exact posterior inference is intractable—practitioners use Markov chain Monte Carlo (MCMC) or variational inference. Inspired by the split-merge MCMC algorithm for the Dirichlet process (DP) mixture model, we describe a novel split-merge MCMC sampling algorithm for posterior inference in the HDP. We study its properties on both synthetic data and text corpora. We find that split-merge MCMC for the HDP can provide significant improvements over traditional Gibbs sampling, and we give some understanding of the data properties that give rise to larger improvements.
@bayes 
may 2015 by tdhopper
Markov Chain Sampling Methods for Dirichlet Process Mixture Models
This article reviews Markov chain methods for sampling from the posterior distri- bution of a Dirichlet process mixture model and presents two new classes of methods. One new approach is to make Metropolis-Hastings updates of the indicators specifying which mixture component is associated with each observation, perhaps supplemented with a partial form of Gibbs sampling. The other new approach extends Gibbs sampling for these indicators by using a set of auxiliary parameters. These methods are simple to implement and are more efficient than previous ways of handling general Dirichlet process mixture models with non-conjugate priors.
@Bayes 
may 2015 by tdhopper
Integrating Out Multinomial Parameters in Latent Dirichlet Allocation and Naive Bayes for Collapsed Gibbs Sampling
This note shows how to integrate out the multinomial parameters for latent Dirichlet allocation (LDA) and naive Bayes (NB) models. This allows us to perform Gibbs sampling without taking multinomial parameter samples. Although the conjugacy of the Dirichlet priors makes sampling the multinomial parameters relatively straightforward, sampling on a topic-by-topic basis provides two advantages. First, it means that all samples are drawn from simple discrete distributions with easily calculated parameters. Second, and more importantly, collapsing supports fully stochastic Gibbs sampling where the model is updated after each word (in LDA) or document (in NB) is assigned a topic. Typically, more stochastic sampling leads to quicker convergence to the stationary state of the Markov chain made up of the Gibbs samples.

Provides joint probability for LDA.
@Bayes 
may 2015 by tdhopper
HDP-LDA updates
Derivation of necessary equations for applying HDP to LDA case.
@bayes 
may 2015 by tdhopper
Finding scientific topics
Original paper describing Gibbs sampler for LDA.
@bayes 
april 2015 by tdhopper
bnpy : Bayesian nonparametric machine learning for python.
This python module provides code for training popular clustering models on large datasets. We focus on Bayesian nonparametric models based on the Dirichlet process, but also provide parametric counterparts.
bnpy supports the latest online learning algorithms as well as standard offline methods. Our aim is to provide an inference platform that makes it easy for researchers and practitioners to compare models and algorithms.
@bayes 
april 2015 by tdhopper
Tom Griffiths' Bayesian Reading List
This list is intended to introduce some of the tools of Bayesian statistics and machine learning that can be useful to computational research in cognitive science. The first section mentions several useful general references, and the others provide supplementary readings on specific topics.
@bayes 
april 2015 by tdhopper
Estimating a Dirichlet distribution
The Dirichlet distribution and its compound variant, the Dirichlet-multinomial, are two of the most basic models for proportional data, such as the mix of vocabulary words in a text document. Yet the maximum-likelihood estimate of these distributions is not available in closed-form. This paper describes simple and efficient iterative schemes for obtaining parameter estimates in these models. In each case, a fixed-point iteration and a Newton-Raphson (or generalized Newton-Raphson) iteration is provided.
@bayes 
april 2015 by tdhopper
Understanding the “Antoniak equation”
Explanation of the distribution over the number of clusters used after $n$ draws from a Dirichlet process.
@bayes 
april 2015 by tdhopper
Yee Whye Teh Code Implementations
Original Hierarchical Dirichlet Process Gibbs sampler in Matlab.
@bayes 
april 2015 by tdhopper
“Infinite LDA” – Implementing the HDP with minimum code complexity
Shows how the hierarchical Dirichlet process (HDP) may be implemented in a simple way, following the idea that the HDP is an extension to its parametric counterpart, latent Dirichlet allocation (LDA).
@bayes 
april 2015 by tdhopper
COMPSCI 3016: Computational Cognitive Science
If you do go on to read any of the literature on the CRP, you’ll very quickly find that people talk about the CRP as if it were an “infinite model”, and they seem to use the following terms as if they mean the same thing: • Chinese restaurant process (CRP) • Dirichlet process (DP) • P´olya urn scheme • Stick-breaking process Beware! These are not the same thing, but they are very similar. As a consequence, a lot of academic papers are actually very unclear about which of these four things they mean, and often use the wrong terms. In this section, we’ll try to clear up what each of these terms means, and explain what the “infinite models” terminology is all about.
@bayes 
march 2015 by tdhopper
Bayesian Nonparametrics 1
This is Yee Whye Teh's first talk on Bayesian Nonparametrics, given at the Machine Learning Summer School 2013, held at the Max Planck Institute for Intelligent Systems, in Tübingen, Germany, from 26 August to 6 September 2013.
@bayes 
march 2015 by tdhopper
Advanced Statistical Computing at Vanderbilt University's Department of Biostatistics
Course covers numerical optimization, Markov Chain Monte Carlo (MCMC), Metropolis-Hastings, Gibbs sampling, estimation-maximization (EM) algorithms, data augmentation algorithms with applications for model fitting and techniques for dealing with missing data.
@bayes 
march 2015 by tdhopper
Conjugate prior diagram
The following diagram summarizes conjugate prior relationships for a number of common sampling distributions. Arrows point from a sampling distribution to its conjugate prior distribution. The symbol near the arrow indicates which parameter the prior is unknown.
@bayes 
march 2015 by tdhopper
Markov Chain Sampling Methods for Dirichlet Process Mixture Models
Markov chain methods for sampling from the posterior distribution of a Dirichlet process mixture model are reviewed, and two new classes of methods are presented. One new approach is to make Metropolis-Hastings updates of the indicators specifying which mixture component is associated with each observation, perhaps supplemented with a partial form of Gibbs sampling. The other new approach extends Gibbs sampling for these indicators by using a set of auxiliary parameters. These methods are simple to implement and are more efficient than previous ways of handling general Dirichlet process mixture models with non-conjugate priors.
@bayes 
march 2015 by tdhopper
Combinatorial Stochastic Processes
The main theme of this course is the study of various combinatorial models of random partitions and random trees, and the asymptotics of these models related to continuous parameter stochastic processes. A basic feature of models for random partitions is that the sum of the parts is usually constant. So the sizes of the parts cannot be independent. But the structure of many natural models for random partitions can be reduced by suitable conditioning or scaling to classical probabilistic results involving sums of independent random variables. Limit models for combinatorially defined random partitions are consequently related to the two fundamental limit processes of classical probability theory: Brownian motion and Poisson processes. The theory of Brownian motion and related stochastic processes has been greatly enriched by the recognition that some fundamental properties of these processes are best understood in terms of how various random partitions and random trees are embedded in their paths.

Includes discussion of Chinese restaurant process and other sequential constructions of random partitions.
@bayes 
march 2015 by tdhopper
Learning & Inference in Probabilistic Graphical Models
Erik Sudderth's slides and other readings for a class at Brown.
@bayes 
march 2015 by tdhopper
Applied Bayesian Nonparametrics
Erik Sudderth's slides and other readings for his nonparametrics class at Brown.
@bayes 
march 2015 by tdhopper
Probabilistic Graphical Models
Erik Sudderth's slides and other readings for his PGM class at Brown.
@bayes 
march 2015 by tdhopper
Graphical Models for Visual Object Recognition and Tracking
We develop statistical methods which allow effective visual detection, categorization, and tracking of objects in complex scenes. Such computer vision systems must be robust to wide variations in object appearance, the often small size of training databases, and ambiguities induced by articulated or partially occluded objects. Graphical models provide a powerful framework for encoding the statistical structure of visual scenes, and developing corresponding learning and inference algorithms. In this thesis, we describe several models which integrate graphical representations with nonparametric statistical methods. This approach leads to inference algorithms which tractably recover high– dimensional, continuous object pose variations, and learning procedures which transfer knowledge among related recognition tasks.
@bayes 
february 2015 by tdhopper
Nonparametric Bayes Tutorial
This page collects references and tutorials on Bayesian nonparametrics
@bayes 
february 2015 by tdhopper
Bayesian Mixtures of Bernoulli Distributions
Mixtures of Bernoulli distributions are typically trained using an expectation-maximization (EM) algorithm, i.e. by performing maximum likelihood estimation. In this report, we develop a Gibbs sampler for a fully Bayesian variant of the Bernoulli mixture, in which (conjugate) priors are introduced over both the mixing proportions and over the parameters of the Bernoulli distributions. We develop both a finite Bayesian Bernoulli mixture (using a Dirichlet prior over the latent class assignment variables) and an infinite Bernoulli mixture (using a Dirichlet Process prior). We perform experiments in which we compare the performance of the Bayesian Bernoulli mixtures with that of a standard Bernoulli mixture and a Restricted Boltzmann Machine on a task in which the (unobserved) bottom half of a handwritten digit needs to be predicted from the (observed) top half of that digit.
@bayes 
february 2015 by tdhopper
Bayesian Modelling and Inference on Mixtures of Distributions
This chapter aims to introduce the reader to the construction, prior modelling, estimation and evaluation of mixture distributions in a Bayesian paradigm. We will show that mixture distributions provide a flexible, parametric framework for statistical modelling and analysis. Focus is on methods rather than advanced examples, in the hope that an understanding of the practical aspects of such modelling can be carried into many disciplines. It also stresses implementation via specific MCMC algorithms that can be easily reproduced by the reader. In Section 1.2, we detail some basic properties of mixtures, along with two different motivations. Section 1.3 points out the fundamental difficulty in doing inference with such objects, along with a discussion about prior modelling, which is more restrictive than usual, and the constructions of estimators, which also is more involved than the standard posterior mean solution. Section 1.4 describes the completion and non-completion MCMC algorithms that can be used for the approximation to the posterior distribution on mixture parameters, followed by an extension of this analysis in Section 1.5 to the case in which the number of components is unknown and may be estimated by Green’s (1995) reversible jump algorithm and Stephens’ 2000 birth-and-death procedure. Section 1.6 gives some pointers to related models and problems like mixtures of regressions (or conditional mixtures) and hidden Markov models (or dependent mixtures), as well as Dirichlet priors.
@bayes 
february 2015 by tdhopper
A Compendium of Conjugate Priors
This report reviews conjugate priors and priors closed under sampling for a variety of data
generating processes where the prior distributions are univariate, bivariate, and multivariate.
The effects of transformations on conjugate prior relationships are considered and cases where
conjugate prior relationships can be applied under transformations are identified. Univariate
and bivariate prior relationships are verified using Monte Carlo methods.
@bayes 
february 2015 by tdhopper
Conjugate Bayesian analysis of the Gaussian distribution
The Gaussian or normal distribution is one of the most widely used in statistics. Estimating its parameters using Bayesian inference and conjugate priors is also widely used. The use of conjugate priors allows all the results to be derived in closed form. Unfortunately, different books use different conventions on how to parameterize the various distributions (e.g., put the prior on the precision or the variance, use an inverse gamma or inverse chi-squared, etc), which can be very confusing for the student. In this report, we summarize all of the most commonly used forms. We provide detailed derivations for some of these results; the rest can be obtained by simple reparameterization. See the appendix for the definition the distributions that are used.
@bayes 
february 2015 by tdhopper
The Dirichlet-Multinomial and Dirichlet-Categorical models for Bayesian inference
This document collects in one place various results for both the Dirichlet-multinomial and Dirichlet-categorical likelihood model. Both models, while simple, are actually a source of confusion because the terminology has been very sloppily overloaded on the internet. We’ll clear the confusion in this writeup.
@bayes 
february 2015 by tdhopper
Slides on Conjugate Models
Detailed derivation of a normal-normal conjugate model
@bayes 
february 2015 by tdhopper
Slides of MCMC for Finite Mixture Models
Gibbs sampler for normal-normal mixture model
@bayes 
february 2015 by tdhopper
A tutorial on Bayesian nonparametric models
A key problem in statistical modeling is model selection, that is, how to choose a model at an appropriate level of complexity. This problem appears in many settings, most prominently in choosing the number of clusters in mixture models or the number of factors in factor analysis. In this tutorial, we describe Bayesian nonparametric methods, a class of methods that side-steps this issue by allowing the data to determine the complexity of the model. This tutorial is a high-level introduction to Bayesian nonparametric methods and contains several examples of their application,
@bayes 
february 2015 by tdhopper
COS597C: Bayesian Nonparametrics
Syllabus (with links to learning materials) on Bayesian Nonparametrics
@bayes 
february 2015 by tdhopper

related tags

@bayes 

Copy this bookmark:



description:


tags: