Topic Modeling Bibliography

may 2015 by tdhopper

"My peronal bibliography for topic modeling, with some notes about how the paper is useful to me."

@bayes
may 2015 by tdhopper

Hierarchical Dirichlet Process (with Split-Merge Operations)

may 2015 by tdhopper

C++ implementation of Chong Wang's "Hierarchical Dirichlet Process (with Split-Merge)" algorithm.

@bayes
may 2015 by tdhopper

On Markov chain Monte Carlo methods for tall data

may 2015 by tdhopper

Markov chain Monte Carlo methods are often deemed too computationally intensive to be of any practical use for big data applications, and in particular for inference on datasets containing a large number n of individual data points, also known as tall datasets. In scenarios where data are assumed independent, various approaches to scale up the Metropolis-Hastings algorithm in a Bayesian inference context have been recently proposed in machine learning and computational statistics. These approaches can be grouped into two categories: divide-and-conquer approaches and, subsampling-based algorithms. The aims of this article are as follows. First, we present a comprehensive review of the existing literature, commenting on the underlying assumptions and theoretical guarantees of each method. Second, by leveraging our understanding of these limitations, we propose an original subsampling-based approach which samples from a distribution provably close to the posterior distribution of interest, yet can require less than O(n) data point likelihood evaluations at each iteration for certain statistical models in favourable scenarios. Finally, we have only been able so far to propose subsampling-based methods which display good performance in scenarios where the Bernstein-von Mises approximation of the target posterior distribution is excellent. It remains an open challenge to develop such methods in scenarios where the Bernstein-von Mises approximation is poor.

@bayes
may 2015 by tdhopper

Collapsed Gibbs

may 2015 by tdhopper

Helpful, intuitive explanation of what collapsing a Gibbs sampler does.

@bayes
may 2015 by tdhopper

A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process

may 2015 by tdhopper

Abstract The hierarchical Dirichlet process (HDP) has be- come an important Bayesian nonparametric model for grouped data, such as document collections. The HDP is used to con- struct a flexible mixed-membership model where the num- ber of components is determined by the data. As for most Bayesian nonparametric models, exact posterior inference is intractable—practitioners use Markov chain Monte Carlo (MCMC) or variational inference. Inspired by the split-merge MCMC algorithm for the Dirichlet process (DP) mixture model, we describe a novel split-merge MCMC sampling algorithm for posterior inference in the HDP. We study its properties on both synthetic data and text corpora. We find that split-merge MCMC for the HDP can provide significant improvements over traditional Gibbs sampling, and we give some understanding of the data properties that give rise to larger improvements.

@bayes
may 2015 by tdhopper

Markov Chain Sampling Methods for Dirichlet Process Mixture Models

may 2015 by tdhopper

This article reviews Markov chain methods for sampling from the posterior distri- bution of a Dirichlet process mixture model and presents two new classes of methods. One new approach is to make Metropolis-Hastings updates of the indicators specifying which mixture component is associated with each observation, perhaps supplemented with a partial form of Gibbs sampling. The other new approach extends Gibbs sampling for these indicators by using a set of auxiliary parameters. These methods are simple to implement and are more efficient than previous ways of handling general Dirichlet process mixture models with non-conjugate priors.

@Bayes
may 2015 by tdhopper

Integrating Out Multinomial Parameters in Latent Dirichlet Allocation and Naive Bayes for Collapsed Gibbs Sampling

may 2015 by tdhopper

This note shows how to integrate out the multinomial parameters for latent Dirichlet allocation (LDA) and naive Bayes (NB) models. This allows us to perform Gibbs sampling without taking multinomial parameter samples. Although the conjugacy of the Dirichlet priors makes sampling the multinomial parameters relatively straightforward, sampling on a topic-by-topic basis provides two advantages. First, it means that all samples are drawn from simple discrete distributions with easily calculated parameters. Second, and more importantly, collapsing supports fully stochastic Gibbs sampling where the model is updated after each word (in LDA) or document (in NB) is assigned a topic. Typically, more stochastic sampling leads to quicker convergence to the stationary state of the Markov chain made up of the Gibbs samples.

Provides joint probability for LDA.

@Bayes
Provides joint probability for LDA.

may 2015 by tdhopper

HDP-LDA updates

may 2015 by tdhopper

Derivation of necessary equations for applying HDP to LDA case.

@bayes
may 2015 by tdhopper

Finding scientific topics

april 2015 by tdhopper

Original paper describing Gibbs sampler for LDA.

@bayes
april 2015 by tdhopper

bnpy : Bayesian nonparametric machine learning for python.

april 2015 by tdhopper

This python module provides code for training popular clustering models on large datasets. We focus on Bayesian nonparametric models based on the Dirichlet process, but also provide parametric counterparts.

bnpy supports the latest online learning algorithms as well as standard offline methods. Our aim is to provide an inference platform that makes it easy for researchers and practitioners to compare models and algorithms.

@bayes
bnpy supports the latest online learning algorithms as well as standard offline methods. Our aim is to provide an inference platform that makes it easy for researchers and practitioners to compare models and algorithms.

april 2015 by tdhopper

Tom Griffiths' Bayesian Reading List

april 2015 by tdhopper

This list is intended to introduce some of the tools of Bayesian statistics and machine learning that can be useful to computational research in cognitive science. The first section mentions several useful general references, and the others provide supplementary readings on specific topics.

@bayes
april 2015 by tdhopper

Estimating a Dirichlet distribution

april 2015 by tdhopper

The Dirichlet distribution and its compound variant, the Dirichlet-multinomial, are two of the most basic models for proportional data, such as the mix of vocabulary words in a text document. Yet the maximum-likelihood estimate of these distributions is not available in closed-form. This paper describes simple and efficient iterative schemes for obtaining parameter estimates in these models. In each case, a fixed-point iteration and a Newton-Raphson (or generalized Newton-Raphson) iteration is provided.

@bayes
april 2015 by tdhopper

Understanding the “Antoniak equation”

april 2015 by tdhopper

Explanation of the distribution over the number of clusters used after $n$ draws from a Dirichlet process.

@bayes
april 2015 by tdhopper

Yee Whye Teh Code Implementations

april 2015 by tdhopper

Original Hierarchical Dirichlet Process Gibbs sampler in Matlab.

@bayes
april 2015 by tdhopper

“Infinite LDA” – Implementing the HDP with minimum code complexity

april 2015 by tdhopper

Shows how the hierarchical Dirichlet process (HDP) may be implemented in a simple way, following the idea that the HDP is an extension to its parametric counterpart, latent Dirichlet allocation (LDA).

@bayes
april 2015 by tdhopper

COMPSCI 3016: Computational Cognitive Science

march 2015 by tdhopper

If you do go on to read any of the literature on the CRP, you’ll very quickly find that people talk about the CRP as if it were an “infinite model”, and they seem to use the following terms as if they mean the same thing: • Chinese restaurant process (CRP) • Dirichlet process (DP) • P´olya urn scheme • Stick-breaking process Beware! These are not the same thing, but they are very similar. As a consequence, a lot of academic papers are actually very unclear about which of these four things they mean, and often use the wrong terms. In this section, we’ll try to clear up what each of these terms means, and explain what the “infinite models” terminology is all about.

@bayes
march 2015 by tdhopper

Bayesian Nonparametrics 1

march 2015 by tdhopper

This is Yee Whye Teh's first talk on Bayesian Nonparametrics, given at the Machine Learning Summer School 2013, held at the Max Planck Institute for Intelligent Systems, in Tübingen, Germany, from 26 August to 6 September 2013.

@bayes
march 2015 by tdhopper

Advanced Statistical Computing at Vanderbilt University's Department of Biostatistics

march 2015 by tdhopper

Course covers numerical optimization, Markov Chain Monte Carlo (MCMC), Metropolis-Hastings, Gibbs sampling, estimation-maximization (EM) algorithms, data augmentation algorithms with applications for model fitting and techniques for dealing with missing data.

@bayes
march 2015 by tdhopper

Conjugate prior diagram

march 2015 by tdhopper

The following diagram summarizes conjugate prior relationships for a number of common sampling distributions. Arrows point from a sampling distribution to its conjugate prior distribution. The symbol near the arrow indicates which parameter the prior is unknown.

@bayes
march 2015 by tdhopper

Markov Chain Sampling Methods for Dirichlet Process Mixture Models

march 2015 by tdhopper

Markov chain methods for sampling from the posterior distribution of a Dirichlet process mixture model are reviewed, and two new classes of methods are presented. One new approach is to make Metropolis-Hastings updates of the indicators specifying which mixture component is associated with each observation, perhaps supplemented with a partial form of Gibbs sampling. The other new approach extends Gibbs sampling for these indicators by using a set of auxiliary parameters. These methods are simple to implement and are more efficient than previous ways of handling general Dirichlet process mixture models with non-conjugate priors.

@bayes
march 2015 by tdhopper

Combinatorial Stochastic Processes

march 2015 by tdhopper

The main theme of this course is the study of various combinatorial models of random partitions and random trees, and the asymptotics of these models related to continuous parameter stochastic processes. A basic feature of models for random partitions is that the sum of the parts is usually constant. So the sizes of the parts cannot be independent. But the structure of many natural models for random partitions can be reduced by suitable conditioning or scaling to classical probabilistic results involving sums of independent random variables. Limit models for combinatorially defined random partitions are consequently related to the two fundamental limit processes of classical probability theory: Brownian motion and Poisson processes. The theory of Brownian motion and related stochastic processes has been greatly enriched by the recognition that some fundamental properties of these processes are best understood in terms of how various random partitions and random trees are embedded in their paths.

Includes discussion of Chinese restaurant process and other sequential constructions of random partitions.

@bayes
Includes discussion of Chinese restaurant process and other sequential constructions of random partitions.

march 2015 by tdhopper

Bayesian Modeling in the Social Sciences: an introduction to Markov-Chain Monte Carlo

march 2015 by tdhopper

Length lecture notes from Jackman on MCMC

@bayes
march 2015 by tdhopper

Learning & Inference in Probabilistic Graphical Models

march 2015 by tdhopper

Erik Sudderth's slides and other readings for a class at Brown.

@bayes
march 2015 by tdhopper

Applied Bayesian Nonparametrics

march 2015 by tdhopper

Erik Sudderth's slides and other readings for his nonparametrics class at Brown.

@bayes
march 2015 by tdhopper

Probabilistic Graphical Models

march 2015 by tdhopper

Erik Sudderth's slides and other readings for his PGM class at Brown.

@bayes
march 2015 by tdhopper

Graphical Models for Visual Object Recognition and Tracking

february 2015 by tdhopper

We develop statistical methods which allow effective visual detection, categorization, and tracking of objects in complex scenes. Such computer vision systems must be robust to wide variations in object appearance, the often small size of training databases, and ambiguities induced by articulated or partially occluded objects. Graphical models provide a powerful framework for encoding the statistical structure of visual scenes, and developing corresponding learning and inference algorithms. In this thesis, we describe several models which integrate graphical representations with nonparametric statistical methods. This approach leads to inference algorithms which tractably recover high– dimensional, continuous object pose variations, and learning procedures which transfer knowledge among related recognition tasks.

@bayes
february 2015 by tdhopper

Nonparametric Bayes Tutorial

february 2015 by tdhopper

This page collects references and tutorials on Bayesian nonparametrics

@bayes
february 2015 by tdhopper

Bayesian Mixtures of Bernoulli Distributions

february 2015 by tdhopper

Mixtures of Bernoulli distributions are typically trained using an expectation-maximization (EM) algorithm, i.e. by performing maximum likelihood estimation. In this report, we develop a Gibbs sampler for a fully Bayesian variant of the Bernoulli mixture, in which (conjugate) priors are introduced over both the mixing proportions and over the parameters of the Bernoulli distributions. We develop both a finite Bayesian Bernoulli mixture (using a Dirichlet prior over the latent class assignment variables) and an infinite Bernoulli mixture (using a Dirichlet Process prior). We perform experiments in which we compare the performance of the Bayesian Bernoulli mixtures with that of a standard Bernoulli mixture and a Restricted Boltzmann Machine on a task in which the (unobserved) bottom half of a handwritten digit needs to be predicted from the (observed) top half of that digit.

@bayes
february 2015 by tdhopper

Bayesian Modelling and Inference on Mixtures of Distributions

february 2015 by tdhopper

This chapter aims to introduce the reader to the construction, prior modelling, estimation and evaluation of mixture distributions in a Bayesian paradigm. We will show that mixture distributions provide a flexible, parametric framework for statistical modelling and analysis. Focus is on methods rather than advanced examples, in the hope that an understanding of the practical aspects of such modelling can be carried into many disciplines. It also stresses implementation via specific MCMC algorithms that can be easily reproduced by the reader. In Section 1.2, we detail some basic properties of mixtures, along with two different motivations. Section 1.3 points out the fundamental difficulty in doing inference with such objects, along with a discussion about prior modelling, which is more restrictive than usual, and the constructions of estimators, which also is more involved than the standard posterior mean solution. Section 1.4 describes the completion and non-completion MCMC algorithms that can be used for the approximation to the posterior distribution on mixture parameters, followed by an extension of this analysis in Section 1.5 to the case in which the number of components is unknown and may be estimated by Green’s (1995) reversible jump algorithm and Stephens’ 2000 birth-and-death procedure. Section 1.6 gives some pointers to related models and problems like mixtures of regressions (or conditional mixtures) and hidden Markov models (or dependent mixtures), as well as Dirichlet priors.

@bayes
february 2015 by tdhopper

A Compendium of Conjugate Priors

february 2015 by tdhopper

This report reviews conjugate priors and priors closed under sampling for a variety of data

generating processes where the prior distributions are univariate, bivariate, and multivariate.

The effects of transformations on conjugate prior relationships are considered and cases where

conjugate prior relationships can be applied under transformations are identified. Univariate

and bivariate prior relationships are verified using Monte Carlo methods.

@bayes
generating processes where the prior distributions are univariate, bivariate, and multivariate.

The effects of transformations on conjugate prior relationships are considered and cases where

conjugate prior relationships can be applied under transformations are identified. Univariate

and bivariate prior relationships are verified using Monte Carlo methods.

february 2015 by tdhopper

Conjugate Bayesian analysis of the Gaussian distribution

february 2015 by tdhopper

The Gaussian or normal distribution is one of the most widely used in statistics. Estimating its parameters using Bayesian inference and conjugate priors is also widely used. The use of conjugate priors allows all the results to be derived in closed form. Unfortunately, different books use different conventions on how to parameterize the various distributions (e.g., put the prior on the precision or the variance, use an inverse gamma or inverse chi-squared, etc), which can be very confusing for the student. In this report, we summarize all of the most commonly used forms. We provide detailed derivations for some of these results; the rest can be obtained by simple reparameterization. See the appendix for the definition the distributions that are used.

@bayes
february 2015 by tdhopper

The Dirichlet-Multinomial and Dirichlet-Categorical models for Bayesian inference

february 2015 by tdhopper

This document collects in one place various results for both the Dirichlet-multinomial and Dirichlet-categorical likelihood model. Both models, while simple, are actually a source of confusion because the terminology has been very sloppily overloaded on the internet. We’ll clear the confusion in this writeup.

@bayes
february 2015 by tdhopper

Slides on Conjugate Models

february 2015 by tdhopper

Detailed derivation of a normal-normal conjugate model

@bayes
february 2015 by tdhopper

Slides of MCMC for Finite Mixture Models

february 2015 by tdhopper

Gibbs sampler for normal-normal mixture model

@bayes
february 2015 by tdhopper

A tutorial on Bayesian nonparametric models

february 2015 by tdhopper

A key problem in statistical modeling is model selection, that is, how to choose a model at an appropriate level of complexity. This problem appears in many settings, most prominently in choosing the number of clusters in mixture models or the number of factors in factor analysis. In this tutorial, we describe Bayesian nonparametric methods, a class of methods that side-steps this issue by allowing the data to determine the complexity of the model. This tutorial is a high-level introduction to Bayesian nonparametric methods and contains several examples of their application,

@bayes
february 2015 by tdhopper

COS597C: Bayesian Nonparametrics

february 2015 by tdhopper

Syllabus (with links to learning materials) on Bayesian Nonparametrics

@bayes
february 2015 by tdhopper