cshalizi + chatterjee.sourav   5

[1511.01437] The sample size required in importance sampling
"The goal of importance sampling is to estimate the expected value of a given function with respect to a probability measure ν using a random sample of size n drawn from a different probability measure μ. If the two measures μ and ν are nearly singular with respect to each other, which is often the case in practice, the sample size required for accurate estimation is large. In this article it is shown that in a fairly general setting, a sample of size approximately exp(D(ν||μ)) is necessary and sufficient for accurate estimation by importance sampling, where D(ν||μ) is the Kullback--Leibler divergence of μ from ν. In particular, the required sample size exhibits a kind of cut-off in the logarithmic scale. The theory is applied to obtain a fairly general formula for the sample size required in importance sampling for exponential families (Gibbs measures). We also show that the standard variance-based diagnostic for convergence of importance sampling is fundamentally problematic. An alternative diagnostic that provably works in certain situations is suggested."
to:NB  to_read  statistics  monte_carlo  probability  information_theory  to_teach:statcomp  chatterjee.sourav  diaconis.persi  via:ded-maxim  re:fitness_sampling 
october 2016 by cshalizi
[1212.1247] Matrix estimation by Universal Singular Value Thresholding
"Consider the problem of estimating the entries of a large matrix, when the observed entries are noisy versions of a small random fraction of the original entries. This problem has received widespread attention in recent times, especially after the pioneering works of Emmanuel Candes and collaborators. Typically, it is assumed that the underlying matrix has low rank. This paper introduces a simple estimation procedure, called Universal Singular Value Thresholding (USVT), that works for any matrix that has `a little bit of structure'. In particular, the matrix need not be of low rank. The procedure is very simple and fast, works under minimal assumptions, and is applicable for very large matrices. Surprisingly, this simple estimator achieves the minimax error rate up to a constant factor. The method is applied to give simple solutions to difficult questions in low rank matrix estimation, blockmodels, distance matrix completion, latent space models, positive definite matrix completion, problems related to graph limits, and generalized Bradley-Terry models for pairwise comparison."
in_NB  estimation  statistics  spectral_clustering  graph_limits  chatterjee.sourav  to_read 
january 2013 by cshalizi
[1102.2650] Estimating and Understanding Exponential Random Graph Models
Rigorous results on conditions under which mean field approximations become exact for large ERGMs, and consequently they end up looking like Erdos-Renyi graphs (at some effective density); these come from clever large-deviations arguments.  To over-simplify some pretty theorems, the cases are (1) all interactions between edges are positive, so introducing any positive density of edges at all tends to produce a blow-up towards a nearly complete graph (where edges are independent); or (2) all interactions are extremely weak, and consequently negligible in the large-scale limit.
exponential_family_random_graphs  network_data_analysis  large_deviations  mean-field_theory  statistics  stochastic_processes  re:almost_none  graph_limits  statistical_mechanics  have_read  re:smoothing_adjacency_matrices  re:your_favorite_ergm_sucks  in_NB  chatterjee.sourav  diaconis.persi 
february 2011 by cshalizi
[1005.1136] Random graphs with a given degree sequence
Per Newman, Strogatz & Watts (2001; cond-mat/0007235), these graphs are very natural null models for networks, so this may be useful for getting analytic theories for such tests.
random_graphs  network_data_analysis  via:arinaldo  re:smoothing_adjacency_matrices  graph_limits  have_read  diaconis.persi  chatterjee.sourav 
july 2010 by cshalizi

Copy this bookmark: