**cshalizi + linear_regression**
58

[1509.09169] Lecture notes on ridge regression

24 days ago by cshalizi

"The linear regression model cannot be fitted to high-dimensional data, as the high-dimensionality brings about empirical non-identifiability. Penalized regression overcomes this non-identifiability by augmentation of the loss function by a penalty (i.e. a function of regression coefficients). The ridge penalty is the sum of squared regression coefficients, giving rise to ridge regression. Here many aspect of ridge regression are reviewed e.g. moments, mean squared error, its equivalence to constrained estimation, and its relation to Bayesian regression. Finally, its behaviour and use are illustrated in simulation and on omics data. Subsequently, ridge regression is generalized to allow for a more general penalty. The ridge penalization framework is then translated to logistic regression and its properties are shown to carry over. To contrast ridge penalized estimation, the final chapter introduces its lasso counterpart."

to:NB
regression
linear_regression
ridge_regression
statistics
24 days ago by cshalizi

[1907.01954] An Econometric View of Algorithmic Subsampling

4 weeks ago by cshalizi

"Datasets that are terabytes in size are increasingly common, but computer bottlenecks often frustrate a complete analysis of the data. While more data are better than less, diminishing returns suggest that we may not need terabytes of data to estimate a parameter or test a hypothesis. But which rows of data should we analyze, and might an arbitrary subset of rows preserve the features of the original data? This paper reviews a line of work that is grounded in theoretical computer science and numerical linear algebra, and which finds that an algorithmically desirable {\em sketch} of the data must have a {\em subspace embedding} property. Building on this work, we study how prediction and inference is affected by data sketching within a linear regression setup. The sketching error is small compared to the sample size effect which is within the control of the researcher. As a sketch size that is algorithmically optimal may not be suitable for prediction and inference, we use statistical arguments to provide `inference conscious' guides to the sketch size. When appropriately implemented, an estimator that pools over different sketches can be nearly as efficient as the infeasible one using the full sample."

to:NB
computational_statistics
statistics
linear_regression
random_projections
4 weeks ago by cshalizi

[1807.11408] Local Linear Forests

7 weeks ago by cshalizi

"Random forests are a powerful method for non-parametric regression, but are limited in their ability to fit smooth signals, and can show poor predictive performance in the presence of strong, smooth effects. Taking the perspective of random forests as an adaptive kernel method, we pair the forest kernel with a local linear regression adjustment to better capture smoothness. The resulting procedure, local linear forests, enables us to improve on asymptotic rates of convergence for random forests with smooth signals, and provides substantial gains in accuracy on both real and simulated data. We prove a central limit theorem valid under regularity conditions on the forest and smoothness constraints, and propose a computationally efficient construction for confidence intervals. Moving to a causal inference application, we discuss the merits of local regression adjustments for heterogeneous treatment effect estimation, and give an example on a dataset exploring the effect word choice has on attitudes to the social safety net. Last, we include simulation results on real and generated data."

to:NB
linear_regression
ensemble_methods
decision_trees
athey.susan
statistics
7 weeks ago by cshalizi

Fast Generalized Linear Models by Database Sampling and One-Step Polishing: Journal of Computational and Graphical Statistics: Vol 0, No 0

7 weeks ago by cshalizi

"In this article, I show how to fit a generalized linear model to N observations on p variables stored in a relational database, using one sampling query and one aggregation query, as long as N^{1/2+δ} observations can be stored in memory, for some δ>0. The resulting estimator is fully efficient and asymptotically equivalent to the maximum likelihood estimator, and so its variance can be estimated from the Fisher information in the usual way. A proof-of-concept implementation uses R with MonetDB and with SQLite, and could easily be adapted to other popular databases. I illustrate the approach with examples of taxi-trip data in New York City and factors related to car color in New Zealand. "

to:NB
computational_statistics
linear_regression
regression
databases
lumley.thomas
to_teach:statcomp
7 weeks ago by cshalizi

[1906.01990] A Model-free Approach to Linear Least Squares Regression with Exact Probabilities and Applications to Covariate Selection

10 weeks ago by cshalizi

"The classical model for linear regression is ${\mathbold Y}={\mathbold x}{\mathbold \beta} +\sigma{\mathbold \varepsilon}$ with i.i.d. standard Gaussian errors. Much of the resulting statistical inference is based on Fisher's F-distribution. In this paper we give two approaches to least squares regression which are model free. The results hold forall data $({\mathbold y},{\mathbold x})$. The derived probabilities are not only exact, they agree with those using the F-distribution based on the classical model. This is achieved by replacing questions about the size of βj, for example βj=0, by questions about the degree to which the covariate ${\mathbold x}_j$ is better than Gaussian white noise or, alternatively, a random orthogonal rotation of ${\mathbold x}_j$. The idea can be extended to choice of covariates, post selection inference PoSI, step-wise choice of covariates, the determination of dependency graphs and to robust regression and non-linear regression. In the latter two cases the probabilities are no longer exact but are based on the chi-squared distribution. The step-wise choice of covariates is of particular interest: it is a very simple, very fast, very powerful, it controls the number of false positives and does not over fit even in the case where the number of covariates far exceeds the sample size"

in_NB
linear_regression
regression
statistics
to_be_shot_after_a_fair_trial
variable_selection
10 weeks ago by cshalizi

Robust Regression on Stationary Time Series: A Self‐Normalized Resampling Approach - Akashi - 2018 - Journal of Time Series Analysis - Wiley Online Library

may 2018 by cshalizi

"This article extends the self‐normalized subsampling method of Bai et al. (2016) to the M‐estimation of linear regression models, where the covariate and the noise are stationary time series which may have long‐range dependence or heavy tails. The method yields an asymptotic confidence region for the unknown coefficients of the linear regression. The determination of these regions does not involve unknown parameters such as the intensity of the dependence or the heaviness of the distributional tail of the time series. Additional simulations can be found in a supplement. The computer codes are available from the authors."

to:NB
time_series
statistics
linear_regression
heavy_tails
long-range_dependence
may 2018 by cshalizi

[1611.05401] Bootstrapping and Sample Splitting For High-Dimensional, Assumption-Free Inference

april 2018 by cshalizi

"Several new methods have been proposed for performing valid inference after model selection. An older method is sampling splitting: use part of the data for model selection and part for inference. In this paper we revisit sample splitting combined with the bootstrap (or the Normal approximation). We show that this leads to a simple, assumption-free approach to inference and we establish results on the accuracy of the method. In fact, we find new bounds on the accuracy of the bootstrap and the Normal approximation for general nonlinear parameters with increasing dimension which we then use to assess the accuracy of regression inference. We show that an alternative, called the image bootstrap, has higher coverage accuracy at the cost of more computation. We define new parameters that measure variable importance and that can be inferred with greater accuracy than the usual regression coefficients. There is a inference-prediction tradeoff: splitting increases the accuracy and robustness of inference but can decrease the accuracy of the predictions."

to:NB
heard_the_talk
linear_regression
model_selection
bootstrap
kith_and_kin
wasserman.larry
rinaldo.alessandro
g'sell.max
lei.jing
high-dimensional_statistics
statistics
to_teach:linear_models
post-selection_inference
april 2018 by cshalizi

Sufficient Dimension Reduction via Direct Estimation of the Gradients of Logarithmic Conditional Densities | Neural Computation | MIT Press Journals

january 2018 by cshalizi

"Sufficient dimension reduction (SDR) is aimed at obtaining the low-rank projection matrix in the input space such that information about output data is maximally preserved. Among various approaches to SDR, a promising method is based on the eigendecomposition of the outer product of the gradient of the conditional density of output given input. In this letter, we propose a novel estimator of the gradient of the logarithmic conditional density that directly fits a linear-in-parameter model to the true gradient under the squared loss. Thanks to this simple least-squares formulation, its solution can be computed efficiently in a closed form. Then we develop a new SDR method based on the proposed gradient estimator. We theoretically prove that the proposed gradient estimator, as well as the SDR solution obtained from it, achieves the optimal parametric convergence rate. Finally, we experimentally demonstrate that our SDR method compares favorably with existing approaches in both accuracy and computational efficiency on a variety of artificial and benchmark data sets."

to:NB
dimension_reduction
sufficiency
density_estimation
linear_regression
statistics
january 2018 by cshalizi

The importance of the normality assumption in large public health data sets. - PubMed - NCBI

july 2017 by cshalizi

"It is widely but incorrectly believed that the t-test and linear regression are valid only for Normally distributed outcomes. The t-test and linear regression compare the mean of an outcome variable for different subjects. While these are valid even in very small samples if the outcome variable is Normally distributed, their major usefulness comes from the fact that in large samples they are valid for any distribution. We demonstrate this validity by simulation in extremely non-Normal data. We discuss situations in which in other methods such as the Wilcoxon rank sum test and ordinal logistic regression (proportional odds model) have been recommended, and conclude that the t-test and linear regression often provide a convenient and practical alternative. The major limitation on the t-test and linear regression for inference about associations is not a distributional one, but whether detecting and estimating a difference in the mean of the outcome answers the scientific question at hand."

to:NB
linear_regression
statistics
to_teach:linear_models
lumley.thomas
july 2017 by cshalizi

[1611.05923] "Influence Sketching": Finding Influential Samples In Large-Scale Regressions

december 2016 by cshalizi

"There is an especially strong need in modern large-scale data analysis to prioritize samples for manual inspection. For example, the inspection could target important mislabeled samples or key vulnerabilities exploitable by an adversarial attack. In order to solve the "needle in the haystack" problem of which samples to inspect, we develop a new scalable version of Cook's distance, a classical statistical technique for identifying samples which unusually strongly impact the fit of a regression model (and its downstream predictions). In order to scale this technique up to very large and high-dimensional datasets, we introduce a new algorithm which we call "influence sketching." Influence sketching embeds random projections within the influence computation; in particular, the influence score is calculated using the randomly projected pseudo-dataset from the post-convergence General Linear Model (GLM). We validate that influence sketching can reliably and successfully discover influential samples by applying the technique to a malware detection dataset of over 2 million executable files, each represented with almost 100,000 features. For example, we find that randomly deleting approximately 10% of training samples reduces predictive accuracy only slightly from 99.47% to 99.45%, whereas deleting the same number of samples with high influence sketch scores reduces predictive accuracy all the way down to 90.24%. Moreover, we find that influential samples are especially likely to be mislabeled. In the case study, we manually inspect the most influential samples, and find that influence sketching pointed us to new, previously unidentified pieces of malware."

to:NB
regression
linear_regression
computational_statistics
random_projections
via:vaguery
december 2016 by cshalizi

[1209.1508] Confidence sets in sparse regression

november 2016 by cshalizi

"The problem of constructing confidence sets in the high-dimensional linear model with n response variables and p parameters, possibly p≥n, is considered. Full honest adaptive inference is possible if the rate of sparse estimation does not exceed n−1/4, otherwise sparse adaptive confidence sets exist only over strict subsets of the parameter spaces for which sparse estimators exist. Necessary and sufficient conditions for the existence of confidence sets that adapt to a fixed sparsity level of the parameter vector are given in terms of minimal ℓ2-separation conditions on the parameter space. The design conditions cover common coherence assumptions used in models for sparsity, including (possibly correlated) sub-Gaussian designs."

to:NB
confidence_sets
regression
high-dimensional_statistics
linear_regression
statistics
sparsity
van_de_geer.sara
nickl.richard
november 2016 by cshalizi

Vector Generalized Linear and Additive Models - With an | Thomas W. Yee | Springer

october 2015 by cshalizi

"This book presents a greatly enlarged statistical framework compared to generalized linear models (GLMs) with which to approach regression modelling. Comprising of about half-a-dozen major classes of statistical models, and fortified with necessary infrastructure to make the models more fully operable, the framework allows analyses based on many semi-traditional applied statistics models to be performed as a coherent whole.

"Since their advent in 1972, GLMs have unified important distributions under a single umbrella with enormous implications. However, GLMs are not flexible enough to cope with the demands of practical data analysis. And data-driven GLMs, in the form of generalized additive models (GAMs), are also largely confined to the exponential family. The methodology here and accompanying software (the extensive VGAM R package) are directed at these limitations and are described comprehensively for the first time in one volume. This book treats distributions and classical models as generalized regression models, and the result is a much broader application base for GLMs and GAMs.

"The book can be used in senior undergraduate or first-year postgraduate courses on GLMs or categorical data analysis and as a methodology resource for VGAM users. In the second part of the book, the R package VGAM allows readers to grasp immediately applications of the methodology. R code is integrated in the text, and datasets are used throughout. Potential applications include ecology, finance, biostatistics, and social sciences. The methodological contribution of this book stands alone and does not require use of the VGAM package."

--- Hopefully this means the VGAM package is less user-hostile than it was...

to:NB
books:noted
additive_models
linear_regression
regression
statistics
re:ADAfaEPoV
in_wishlist
"Since their advent in 1972, GLMs have unified important distributions under a single umbrella with enormous implications. However, GLMs are not flexible enough to cope with the demands of practical data analysis. And data-driven GLMs, in the form of generalized additive models (GAMs), are also largely confined to the exponential family. The methodology here and accompanying software (the extensive VGAM R package) are directed at these limitations and are described comprehensively for the first time in one volume. This book treats distributions and classical models as generalized regression models, and the result is a much broader application base for GLMs and GAMs.

"The book can be used in senior undergraduate or first-year postgraduate courses on GLMs or categorical data analysis and as a methodology resource for VGAM users. In the second part of the book, the R package VGAM allows readers to grasp immediately applications of the methodology. R code is integrated in the text, and datasets are used throughout. Potential applications include ecology, finance, biostatistics, and social sciences. The methodological contribution of this book stands alone and does not require use of the VGAM package."

--- Hopefully this means the VGAM package is less user-hostile than it was...

october 2015 by cshalizi

Bhatia, R.: Positive Definite Matrices (eBook and Paperback).

september 2015 by cshalizi

"This book represents the first synthesis of the considerable body of new research into positive definite matrices. These matrices play the same role in noncommutative analysis as positive real numbers do in classical analysis. They have theoretical and computational uses across a broad spectrum of disciplines, including calculus, electrical engineering, statistics, physics, numerical analysis, quantum information theory, and geometry. Through detailed explanations and an authoritative and inspiring writing style, Rajendra Bhatia carefully develops general techniques that have wide applications in the study of such matrices.

"Bhatia introduces several key topics in functional analysis, operator theory, harmonic analysis, and differential geometry--all built around the central theme of positive definite matrices. He discusses positive and completely positive linear maps, and presents major theorems with simple and direct proofs. He examines matrix means and their applications, and shows how to use positive definite functions to derive operator inequalities that he and others proved in recent years. He guides the reader through the differential geometry of the manifold of positive definite matrices, and explains recent work on the geometric mean of several matrices."

to:NB
books:noted
mathematics
algebra
linear_regression
re:g_paper
statistics
"Bhatia introduces several key topics in functional analysis, operator theory, harmonic analysis, and differential geometry--all built around the central theme of positive definite matrices. He discusses positive and completely positive linear maps, and presents major theorems with simple and direct proofs. He examines matrix means and their applications, and shows how to use positive definite functions to derive operator inequalities that he and others proved in recent years. He guides the reader through the differential geometry of the manifold of positive definite matrices, and explains recent work on the geometric mean of several matrices."

september 2015 by cshalizi

[1507.01173] Model Diagnostics Based on Cumulative Residuals: The R-package gof

august 2015 by cshalizi

"The generalized linear model is widely used in all areas of applied statistics and while correct asymptotic inference can be achieved under misspecification of the distributional assumptions, a correctly specified mean structure is crucial to obtain interpretable results. Usually the linearity and functional form of predictors are checked by inspecting various scatter plots of the residuals, however, the subjective task of judging these can be challenging. In this paper we present an implementation of model diagnostics for the generalized linear model as well as structural equation models, based on aggregates of the residuals where the asymptotic behavior under the null is imitated by simulations. A procedure for checking the proportional hazard assumption in the Cox regression is also implemented."

to:NB
model_checking
regression
linear_regression
statistics
to_teach:linear_models
august 2015 by cshalizi

[1503.06426] High-dimensional inference in misspecified linear models

may 2015 by cshalizi

"We consider high-dimensional inference when the assumed linear model is misspecified. We describe some correct interpretations and corresponding sufficient assumptions for valid asymptotic inference of the model parameters, which still have a useful meaning when the model is misspecified. We largely focus on the de-sparsified Lasso procedure but we also indicate some implications for (multiple) sample splitting techniques. In view of available methods and software, our results contribute to robustness considerations with respect to model misspecification."

to:NB
statistics
linear_regression
high-dimensional_statistics
lasso
buhlmann.peter
van_de_geer.sara
may 2015 by cshalizi

On the Interpretation of Instrumental Variables in the Presence of Specification Errors

february 2015 by cshalizi

"The method of instrumental variables (IV) and the generalized method of moments (GMM), and their applications to the estimation of errors-in-variables and simultaneous equations models in econometrics, require data on a sufficient number of instrumental variables that are both exogenous and relevant. We argue that, in general, such instruments (weak or strong) cannot exist."

--- I think they are too quick to dismiss non-parametric IV; if what one wants is consistent estimates of the partial derivatives at a given point, you _can_ get that by (e.g.) splines or locally linear regression. Need to think through this in terms of Pearl's graphical definition of IVs.

in_NB
instrumental_variables
misspecification
regression
linear_regression
causal_inference
statistics
econometrics
via:jbdelong
have_read
to_teach:undergrad-ADA
re:ADAfaEPoV
--- I think they are too quick to dismiss non-parametric IV; if what one wants is consistent estimates of the partial derivatives at a given point, you _can_ get that by (e.g.) splines or locally linear regression. Need to think through this in terms of Pearl's graphical definition of IVs.

february 2015 by cshalizi

[1404.1578] Models as Approximations: How Random Predictors and Model Violations Invalidate Classical Inference in Regression

february 2015 by cshalizi

"We review and interpret the early insights of Halbert White who over thirty years ago inaugurated a form of statistical inference for regression models that is asymptotically correct even under "model misspecification," that is, under the assumption that models are approximations rather than generative truths. This form of inference, which is pervasive in econometrics, relies on the "sandwich estimator" of standard error. Whereas linear models theory in statistics assumes models to be true and predictors to be fixed, White's theory permits models to be approximate and predictors to be random. Careful reading of his work shows that the deepest consequences for statistical inference arise from a synergy --- a "conspiracy" --- of nonlinearity and randomness of the predictors which invalidates the ancillarity argument that justifies conditioning on the predictors when they are random. Unlike the standard error of linear models theory, the sandwich estimator provides asymptotically correct inference in the presence of both nonlinearity and heteroskedasticity. An asymptotic comparison of the two types of standard error shows that discrepancies between them can be of arbitrary magnitude. If there exist discrepancies, standard errors from linear models theory are usually too liberal even though occasionally they can be too conservative as well. A valid alternative to the sandwich estimator is provided by the "pairs bootstrap"; in fact, the sandwich estimator can be shown to be a limiting case of the pairs bootstrap. We conclude by giving meaning to regression slopes when the linear model is an approximation rather than a truth. --- In this review we limit ourselves to linear least squares regression, but many qualitative insights hold for most forms of regression."

-- Very close to what I teach in my class, though I haven't really talked about sandwich variances.

in_NB
have_read
statistics
regression
linear_regression
bootstrap
misspecification
estimation
approximation
-- Very close to what I teach in my class, though I haven't really talked about sandwich variances.

february 2015 by cshalizi

[1406.5986] A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares

january 2015 by cshalizi

"We consider statistical aspects of solving large-scale least-squares (LS) problems using randomized sketching algorithms. Prior work has typically adopted an \emph{algorithmic perspective}, in that it has made no statistical assumptions on the input X and Y, and instead it has assumed that the data (X,Y) are fixed and worst-case. In this paper, we adopt a \emph{statistical perspective}, and we consider the mean-squared error performance of randomized sketching algorithms, when data (X,Y) are generated according to a statistical linear model Y=Xβ+ϵ, where ϵ is a noise process. To do this, we first develop a framework for assessing, in a unified manner, algorithmic and statistical aspects of randomized sketching methods. We then consider the statistical predicition efficiency (SPE) and the statistical residual efficiency (SRE) of the sketched LS estimator; and we use our framework to provide results for several types of random projection and random sampling sketching algorithms. Among other results, we show that the SRE can be bounded when p≲r≪n but that the SPE typically requires the sample size r to be substantially larger. Our theoretical results reveal that, depending on the specifics of the situation, leverage-based sampling methods can perform as well as or better than projection methods. Our empirical results reveal that when r is only slightly greater than p and much less than n, projection-based methods out-perform sampling-based methods, but as r grows, sampling methods start to out-perform projection methods."

to:NB
computational_statistics
regression
linear_regression
random_projections
statistics
january 2015 by cshalizi

[1412.3730] Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing It

december 2014 by cshalizi

"We empirically show that Bayesian inference can be inconsistent under misspecification in simple linear regression problems, both in a model averaging/selection and in a Bayesian ridge regression setting. We use the standard linear model, which assumes homoskedasticity, whereas the data are heteroskedastic, and observe that the posterior puts its mass on ever more high-dimensional models as the sample size increases. To remedy the problem, we equip the likelihood in Bayes' theorem with an exponent called the learning rate, and we propose the Safe Bayesian method to learn the learning rate from the data. SafeBayes tends to select small learning rates as soon the standard posterior is not `cumulatively concentrated', and its results on our data are quite encouraging."

in_NB
to_read
linear_regression
bayesianism
bayesian_consistency
misspecification
statistics
grunwald.peter
december 2014 by cshalizi

Political Language in Economics

december 2014 by cshalizi

"Does political ideology influence economic research? We rely upon purely inductive methods in natural language processing and machine learning to examine patterns of implicit political ideology in economic articles. Using observed political behavior of economists and the phrases from their academic articles, we construct a high-dimensional predictor of political ideology by article, economist, school, and journal. In addition to field, journal, and editor ideology, we look at the correlation of author ideology with magnitudes of reported policy relevant elasticities. Overall our results suggest that there is substantial sorting by ideology into fields, departments, and methodologies, and that political ideology influences the results of economic research."

in_NB
economics
ideology
political_economy
text_mining
naidu.suresh
jelveh.zubin
to:blog
topic_models
linear_regression
to_teach:data-mining
december 2014 by cshalizi

Nickl , van de Geer : Confidence sets in sparse regression

february 2014 by cshalizi

"The problem of constructing confidence sets in the high-dimensional linear model with n response variables and p parameters, possibly p≥n, is considered. Full honest adaptive inference is possible if the rate of sparse estimation does not exceed n−1/4, otherwise sparse adaptive confidence sets exist only over strict subsets of the parameter spaces for which sparse estimators exist. Necessary and sufficient conditions for the existence of confidence sets that adapt to a fixed sparsity level of the parameter vector are given in terms of minimal ℓ2-separation conditions on the parameter space. The design conditions cover common coherence assumptions used in models for sparsity, including (possibly correlated) sub-Gaussian designs."

--- Ungated version: http://arxiv.org/abs/1209.1508

in_NB
high-dimensional_statistics
sparsity
confidence_sets
regression
statistics
van_de_geer.sara
nickl.richard
lasso
model_selection
linear_regression
--- Ungated version: http://arxiv.org/abs/1209.1508

february 2014 by cshalizi

[1112.3450] The sparse Laplacian shrinkage estimator for high-dimensional regression

september 2013 by cshalizi

"We propose a new penalized method for variable selection and estimation that explicitly incorporates the correlation patterns among predictors. This method is based on a combination of the minimax concave penalty and Laplacian quadratic associated with a graph as the penalty function. We call it the sparse Laplacian shrinkage (SLS) method. The SLS uses the minimax concave penalty for encouraging sparsity and Laplacian quadratic penalty for promoting smoothness among coefficients associated with the correlated predictors. The SLS has a generalized grouping property with respect to the graph represented by the Laplacian quadratic. We show that the SLS possesses an oracle property in the sense that it is selection consistent and equal to the oracle Laplacian shrinkage estimator with high probability. This result holds in sparse, high-dimensional settings with p >> n under reasonable conditions. We derive a coordinate descent algorithm for computing the SLS estimates. Simulation studies are conducted to evaluate the performance of the SLS method and a real data example is used to illustrate its application."

to:NB
have_read
regression
linear_regression
high-dimensional_statistics
statistics
september 2013 by cshalizi

[1308.2408] Group Lasso for generalized linear models in high dimension

september 2013 by cshalizi

"Nowadays an increasing amount of data is available and we have to deal with models in high dimension (number of covariates much larger than the sample size). Under sparsity assumption it is reasonable to hope that we can make a good estimation of the regression parameter. This sparsity assumption as well as a block structuration of the covariates into groups with similar modes of behavior is for example quite natural in genomics. A huge amount of scientific literature exists for Gaussian linear models including the Lasso estimator and also the Group Lasso estimator which promotes group sparsity under an a priori knowledge of the groups. We extend this Group Lasso procedure to generalized linear models and we study the properties of this estimator for sparse high-dimensional generalized linear models to find convergence rates. We provide oracle inequalities for the prediction and estimation error under assumptions on the joint distribution of the pair observable covariables and under a condition on the design matrix. We show the ability of this estimator to recover good sparse approximation of the true model. At last we extend these results to the case of an Elastic net penalty and we apply them to the so-called Poisson regression case which has not been studied in this context contrary to the logistic regression."

--- Isn't this already done in Buhlmann and van de Geer's book?

to:NB
linear_regression
regression
sparsity
statistics
high-dimensional_statistics
--- Isn't this already done in Buhlmann and van de Geer's book?

september 2013 by cshalizi

Let's Put Garbage-Can Regressions and Garbage-Can Probits Where They Belong

september 2013 by cshalizi

"Many social scientists believe that dumping long lists of explanatory variables into linear regression, probit, logit, and other statistical equations will successfully “control” for the effects of auxiliary factors. Encouraged by convenient software and ever more powerful computing, researchers also believe that this conventional approach gives the true explanatory variables the best chance to emerge. The present paper argues that these beliefs are false, and that statistical models with more than a few independent variables are likely to be inaccurate. Instead, a quite different research methodology is needed, one that integrates contemporary powerful statistical methods with classic data-analytic techniques of creative engagement with the data."

to:NB
to_read
bad_data_analysis
social_science_methodology
statistics
linear_regression
regression
to_teach:undergrad-ADA
have_skimmed
september 2013 by cshalizi

[1307.7963] Efficient variational inference for generalized linear mixed models with large datasets

august 2013 by cshalizi

"The article develops a hybrid Variational Bayes algorithm that combines the mean-field and fixed-form Variational Bayes methods. The new estimation algorithm can be used to approximate any posterior without relying on conjugate priors. We propose a divide and recombine strategy for the analysis of large datasets, which partitions a large dataset into smaller pieces and then combines the variational distributions that have been learnt in parallel on each separate piece using the hybrid Variational Bayes algorithm. The proposed method is applied to fitting generalized linear mixed models. The computational efficiency of the parallel and hybrid Variational Bayes algorithm is demonstrated on several simulated and real datasets."

to:NB
computational_statistics
linear_regression
regression
statistics
estimation
variational_inference
august 2013 by cshalizi

Linear Models: A Useful “Microscope” for Causal Analysis : Journal of Causal Inference

june 2013 by cshalizi

"This note reviews basic techniques of linear path analysis and demonstrates, using simple examples, how causal phenomena of non-trivial character can be understood, exemplified and analyzed using diagrams and a few algebraic steps. The techniques allow for swift assessment of how various features of the model impact the phenomenon under investigation. This includes: Simpson’s paradox, case–control bias, selection bias, missing data, collider bias, reverse regression, bias amplification, near instruments, and measurement errors."

in_NB
pearl.judea
graphical_models
causal_inference
regression
linear_regression
to_teach:undergrad-ADA
have_read
june 2013 by cshalizi

Berk , Brown , Buja , Zhang , Zhao : Valid post-selection inference

may 2013 by cshalizi

"It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees that classical statistical theory provides for tests and confidence intervals when the model has been chosen a priori. We propose to produce valid “post-selection inference” by reducing the problem to one of simultaneous inference and hence suitably widening conventional confidence and retention intervals. Simultaneity is required for all linear functions that arise as coefficient estimates in all submodels. By purchasing “simultaneity insurance” for all possible submodels, the resulting post-selection inference is rendered universally valid under all possible model selection procedures. This inference is therefore generally conservative for particular selection procedures, but it is always less conservative than full Scheffé protection. Importantly it does not depend on the truth of the selected submodel, and hence it produces valid inference even in wrong models. We describe the structure of the simultaneous inference problem and give some asymptotic results."

--- I find this abstract very puzzling given some of the strong negative results on post-selection inference....

model_selection
to_read
linear_regression
regression
statistics
confidence_sets
in_NB
post-selection_inference
--- I find this abstract very puzzling given some of the strong negative results on post-selection inference....

may 2013 by cshalizi

Müller , Scealy , Welsh : Model Selection in Linear Mixed Models

may 2013 by cshalizi

"Linear mixed effects models are highly flexible in handling a broad range of data types and are therefore widely used in applications. A key part in the analysis of data is model selection, which often aims to choose a parsimonious model with other desirable properties from a possibly very large set of candidate statistical models. Over the last 5–10 years the literature on model selection in linear mixed models has grown extremely rapidly. The problem is much more complicated than in linear regression because selection on the covariance structure is not straightforward due to computational issues and boundary problems arising from positive semidefinite constraints on covariance matrices. To obtain a better understanding of the available methods, their properties and the relationships between them, we review a large body of literature on linear mixed model selection. We arrange, implement, discuss and compare model selection methods based on four major approaches: information criteria such as AIC or BIC, shrinkage methods based on penalized loss functions such as LASSO, the Fence procedure and Bayesian techniques."

to:NB
variance_estimation
hierarchical_statistical_models
model_selection
regression
linear_regression
may 2013 by cshalizi

Leeb : On the conditional distributions of low-dimensional projections from high-dimensional data

april 2013 by cshalizi

"We study the conditional distribution of low-dimensional projections from high-dimensional data, where the conditioning is on other low-dimensional projections. To fix ideas, consider a random d-vector Z that has a Lebesgue density and that is standardized so that 𝔼Z=0 and 𝔼ZZ′=Id. Moreover, consider two projections defined by unit-vectors α and β, namely a response y=α′Z and an explanatory variable x=β′Z. It has long been known that the conditional mean of y given x is approximately linear in x, under some regularity conditions; cf. Hall and Li [Ann. Statist. 21 (1993) 867–889]. However, a corresponding result for the conditional variance has not been available so far. We here show that the conditional variance of y given x is approximately constant in x (again, under some regularity conditions). These results hold uniformly in α and for most β’s, provided only that the dimension of Z is large. In that sense, we see that most linear submodels of a high-dimensional overall model are approximately correct. Our findings provide new insights in a variety of modeling scenarios. We discuss several examples, including sliced inverse regression, sliced average variance estimation, generalized linear models under potential link violation, and sparse linear modeling."

Free version: http://arxiv.org/abs/1304.5943

to:NB
regression
linear_regression
random_projections
high-dimensional_probability
re:what_is_the_right_null_model_for_linear_regression
leeb.hannes
Free version: http://arxiv.org/abs/1304.5943

april 2013 by cshalizi

[1303.7092] Pivotal uniform inference in high-dimensional regression with random design in wide classes of models via linear programming

march 2013 by cshalizi

"We propose a new method of estimation in high-dimensional linear regression model. It allows for very weak distributional assumptions including heteroscedasticity, and does not require the knowledge of the variance of random errors. The method is based on linear programming only, so that its numerical implementation is faster than for previously known techniques using conic programs, and it allows one to deal with higher dimensional models. We provide upper bounds for estimation and prediction errors of the proposed estimator showing that it achieves the same rate as in the more restrictive situation of fixed design and i.i.d. Gaussian errors with known variance. Following Gautier and Tsybakov (2011), we obtain the results under weaker sensitivity assumptions than the restricted eigenvalue or assimilated conditions."

to:NB
regression
linear_regression
optimization
high-dimensional_statistics
statistics
march 2013 by cshalizi

[1302.5831] On Testing Independence and Goodness-of-fit in Linear Models

march 2013 by cshalizi

"We consider a linear regression model and propose an omnibus test to simultaneously check the assumption of independence between the error and the predictor variables, and the goodness-of-fit of the parametric model. Our approach is based on testing for independence between the residual and the predictor using the recently developed Hilbert-Schmidt independence criterion, see Gretton et al. (2008). The proposed method requires no user-defined regularization, is simple to compute, based merely on pairwise distances between points in the sample, and is consistent against all alternatives. We develop the distribution theory of the proposed test-statistic, both under the null and the alternative hypotheses, and devise a bootstrap scheme to approximate its null distribution. We prove the consistency of the bootstrap procedure. The superior finite sample performance of our procedure is illustrated through a simulation study."

in_NB
statistics
regression
goodness-of-fit
linear_regression
to_teach:undergrad-ADA
march 2013 by cshalizi

Loh , Wainwright : High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity

september 2012 by cshalizi

"Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependence, as well. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently nonconvex, and it is difficult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing nonconvex programs, we are able to both analyze the statistical error associated with any global optimum, and more surprisingly, to prove that a simple algorithm based on projected gradient descent will converge in polynomial time to a small neighborhood of the set of all global minimizers. On the statistical side, we provide nonasymptotic bounds that hold with high probability for the cases of noisy, missing and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm is guaranteed to converge at a geometric rate to a near-global minimizer. We illustrate these theoretical predictions with simulations, showing close agreement with the predicted scalings."

Ungated; http://arxiv.org/abs/1109.3714

in_NB
to_read
regression
linear_regression
sparsity
high-dimensional_statistics
optimization
wainwright.martin_j.
statistics
Ungated; http://arxiv.org/abs/1109.3714

september 2012 by cshalizi

The Duck of Minerva: Professionalization and the Poverty of IR Theory

september 2012 by cshalizi

"Having spent far too many years on my Department's admissions committee--which I currently chair--I have to agree with part of PM's response: it is simply now a fact of life that prior mathematical and statistical trainings improves one's chances of getting into most of the first- and second-tier IR programs in the United States. But that, as PM also notes, begs the "should it be this way?" question.

"My sense is that over-professionalization of graduate students is an enormous threat to the vibrancy and innovativeness of International Relations (IR). I am far from alone in this assessment. But I think the structural pressures for over-professionalization are awfully powerful; in conjunction with the triumph of behavioralism (or what PTJ reconstructs as neo-positivism), this means that "theory testing" via large-n regression analysis will only grow in dominance over time. ...

"I am not claiming that neopositivist work is "bad" or making substantive claims about the merits of statistical work. I do believe that general-linear-reality (GLR) approaches -- both qualitative and quantitative -- are overused at the expense of non-GLR frameworks--again, both qualitative and quantitative. I am also concerned with the general devaluation of singular-causal analysis...

"What I am claiming is this: that the conjunction of over-professionalization, GLR-style statistical work, and environmental factors is diminishing the overall quality of theorization, circumscribing the audience for good theoretical work, and otherwise working in the direction of impoverishing IR theory."

social_science_methodology
political_science
linear_regression
academia
"My sense is that over-professionalization of graduate students is an enormous threat to the vibrancy and innovativeness of International Relations (IR). I am far from alone in this assessment. But I think the structural pressures for over-professionalization are awfully powerful; in conjunction with the triumph of behavioralism (or what PTJ reconstructs as neo-positivism), this means that "theory testing" via large-n regression analysis will only grow in dominance over time. ...

"I am not claiming that neopositivist work is "bad" or making substantive claims about the merits of statistical work. I do believe that general-linear-reality (GLR) approaches -- both qualitative and quantitative -- are overused at the expense of non-GLR frameworks--again, both qualitative and quantitative. I am also concerned with the general devaluation of singular-causal analysis...

"What I am claiming is this: that the conjunction of over-professionalization, GLR-style statistical work, and environmental factors is diminishing the overall quality of theorization, circumscribing the audience for good theoretical work, and otherwise working in the direction of impoverishing IR theory."

september 2012 by cshalizi

Transcending General Linear Reality (Abbott, 1988)

august 2012 by cshalizi

"This paper argues that the dominance of linear models has led many sociologists to construe the social world in terms of a "general linear reality." This reality assumes (1) that the social world consists of fixed entities with variable attributes, (2) that cause cannot flow from "small" to "large" attributes/events, (3) that causal attributes have only one causal pattern at once, (4) that the sequence of events does not influence their outcome, (5) that the "careers" of entities are largely independent, and (6) that causal attributes are generally independent of each other. The paper discusses examples of these assumptions in empirical work, consider standard and new methods addressing them, and briefly explores alternative models for reality that employ demographic, sequential, and network perspectives."

in_NB
to_read
regression
social_science_methodology
linear_regression
tools_into_theories
to_teach:undergrad-ADA
abbott.andrew
causality
august 2012 by cshalizi

Mahoney: Randomized Algorithms for Matrices and Data

january 2012 by cshalizi

"Randomized algorithms for very large matrix problems have received a great deal of attention in recent years. Much of this work was motivated by problems in large-scale data analysis, largely since matrices are popular structures with which to model data drawn from a wide range of application domains, and this work was performed by individuals from many different research communities. While the most obvious benefit of randomization is that it can lead to faster algorithms, either in worst-case asymptotic theory and/or numerical implementation, there are numerous other benefits that are at least as important. For example, the use of randomization can lead to simpler algorithms that are easier to analyze or reason about when applied in counterintuitive settings; it can lead to algorithms with more interpretable output, which is of interest in applications where analyst time rather than just computational time is of interest; it can lead implicitly to regularization and more robust output; and randomized algorithms can often be organized to exploit modern computational architectures better than classical numerical methods.

"This monograph will provide a detailed overview of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis. Throughout this review, an emphasis will be placed on a few simple core ideas that underlie not only recent theoretical advances but also the usefulness of these tools in large-scale data applications. Crucial in this context is the connection with the concept of statistical leverage. This concept has long been used in statistical regression diagnostics to identify outliers; and it has recently proved crucial in the development of improved worst-case matrix algorithms that are also amenable to high-quality numerical implementation and that are useful to domain scientists. This connection arises naturally when one explicitly decouples the effect of randomization in these matrix algorithms from the underlying linear algebraic structure. This decoupling also permits much finer control in the application of randomization, as well as the easier exploitation of domain knowledge.

"Most of the review will focus on random sampling algorithms and random projection algorithms for versions of the linear least-squares problem and the low-rank matrix approximation problem. These two problems are fundamental in theory and ubiquitous in practice. Randomized methods solve these problems by constructing and operating on a randomized sketch of the input matrix A — for random sampling methods, the sketch consists of a small number of carefully-sampled and rescaled columns/rows of A, while for random projection methods, the sketch consists of a small number of linear combinations of the columns/rows of A. Depending on the specifics of the situation, when compared with the best previously-existing deterministic algorithms, the resulting randomized algorithms have worst-case running time that is asymptotically faster; their numerical implementations are faster in terms of clock-time; or they can be implemented in parallel computing environments where existing numerical algorithms fail to run at all. Numerous examples illustrating these observations will be described in detail."

to:NB
data_analysis
linear_regression
computational_complexity
computational_statistics
random_projections
"This monograph will provide a detailed overview of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis. Throughout this review, an emphasis will be placed on a few simple core ideas that underlie not only recent theoretical advances but also the usefulness of these tools in large-scale data applications. Crucial in this context is the connection with the concept of statistical leverage. This concept has long been used in statistical regression diagnostics to identify outliers; and it has recently proved crucial in the development of improved worst-case matrix algorithms that are also amenable to high-quality numerical implementation and that are useful to domain scientists. This connection arises naturally when one explicitly decouples the effect of randomization in these matrix algorithms from the underlying linear algebraic structure. This decoupling also permits much finer control in the application of randomization, as well as the easier exploitation of domain knowledge.

"Most of the review will focus on random sampling algorithms and random projection algorithms for versions of the linear least-squares problem and the low-rank matrix approximation problem. These two problems are fundamental in theory and ubiquitous in practice. Randomized methods solve these problems by constructing and operating on a randomized sketch of the input matrix A — for random sampling methods, the sketch consists of a small number of carefully-sampled and rescaled columns/rows of A, while for random projection methods, the sketch consists of a small number of linear combinations of the columns/rows of A. Depending on the specifics of the situation, when compared with the best previously-existing deterministic algorithms, the resulting randomized algorithms have worst-case running time that is asymptotically faster; their numerical implementations are faster in terms of clock-time; or they can be implemented in parallel computing environments where existing numerical algorithms fail to run at all. Numerous examples illustrating these observations will be described in detail."

january 2012 by cshalizi

Audibert , Catoni : Robust linear least squares regression

december 2011 by cshalizi

"We consider the problem of robustly predicting as well as the best linear combination of d given functions in least squares regression, and variants of this problem including constraints on the parameters of the linear combination. For the ridge estimator and the ordinary least squares estimator, and their variants, we provide new risk bounds of order d/n without logarithmic factor unlike some standard results, where n is the size of the training data. We also provide a new estimator with better deviations in the presence of heavy-tailed noise. It is based on truncating differences of losses in a min–max framework and satisfies a d/n risk bound both in expectation and in deviations. The key common surprising factor of these results is the absence of exponential moment condition on the output distribution while achieving exponential deviations. All risk bounds are obtained through a PAC-Bayesian analysis on truncated differences of losses. Experimental results strongly back up our truncated min–max estimator."

in_NB
regression
statistics
linear_regression
learning_theory
catoni.olivier
december 2011 by cshalizi

Lehmann: On the history and use of some standard statistical models

december 2009 by cshalizi

"his paper tries to tell the story of the general linear model, which saw the light of day 200 years ago, and the assumptions underlying it. We distinguish three principal stages (ignoring earlier more isolated instances). The model was first proposed in the context of astronomical and geodesic observations, where the main source of variation was observational error. This was the main use of the model during the 19th century.

In the 1920’s it was developed in a new direction by R.A. Fisher whose principal applications were in agriculture and biology. Finally, beginning in the 1930’s and 40’s it became an important tool for the social sciences. As new areas of applications were added, the assumptions underlying the model tended to become more questionable, and the resulting statistical techniques more prone to misuse."

regression
linear_regression
history_of_statistics
statistics
have_read
In the 1920’s it was developed in a new direction by R.A. Fisher whose principal applications were in agriculture and biology. Finally, beginning in the 1930’s and 40’s it became an important tool for the social sciences. As new areas of applications were added, the assumptions underlying the model tended to become more questionable, and the resulting statistical techniques more prone to misuse."

december 2009 by cshalizi

36-707: Regression Analysis, Fall 2007

february 2009 by cshalizi

Larry's class notes on regression. He finished with standard linear regression about 1/3 of the way through the semester, which seems to me to be about the amount of time the subject warrants, and then want to town...

kith_and_kin
regression
linear_regression
nonparametrics
causal_inference
graphical_models
support-vector_machines
statistics
wasserman.larry
february 2009 by cshalizi

[0901.3202] Model-Consistent Sparse Estimation through the Bootstrap

january 2009 by cshalizi

"if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection"

lasso
linear_regression
model_selection
variable_selection
bootstrap
january 2009 by cshalizi

"A Note on the Cobb-Douglas Function": The Review of Economic Studies, Vol. 30, No. 2, (1963 ), pp. 93-94

april 2008 by cshalizi

Shorter Simon & Levy (1963): I am sickened by the weakness of your model's goodness-of-fit test. (Does make me reconsider the many papers I still see using Cobb-Douglas...)

econometrics
simon.herbert
levy.ferdinand
cobb-douglas_production_functions
bad_data_analysis
linear_regression
to_teach
via:slaniel
to_teach:undergrad-ADA
have_read
economics
april 2008 by cshalizi

"Weak inference with linear models" - Psychological Bulletin - Vol 84 Iss 6 Page 1155

april 2008 by cshalizi

"Dude! R^2 sux!" (I paraphrase.)

methodological_advice
linear_regression
statistics
anderson.norm
shanteau.james
via:moritz-heene
experimental_psychology
decision-making
to_teach
to_teach:complexity-and-inference
to_teach:data-mining
to_teach:undergrad-ADA
april 2008 by cshalizi

EconPapers: Does Television Cause Autism?

march 2008 by cshalizi

This is a joke, right? Right? Somebody please tell me this is a joke...

ETA: It's not a joke. It's now a negative example in ADA.

Ungated version: http://forum.johnson.cornell.edu/faculty/waldman/autism-waldman-nicholson-adilov.pdf

please_give_me_strength
autism
econometrics
statistics
linear_regression
causal_inference
instrumental_variables
television
via:arthegall
to_teach:undergrad-ADA
ETA: It's not a joke. It's now a negative example in ADA.

Ungated version: http://forum.johnson.cornell.edu/faculty/waldman/autism-waldman-nicholson-adilov.pdf

march 2008 by cshalizi

[0802.3364] Evaluation and selection of models for out-of-sample prediction when the the sample size is small relative to the complexity of the data-generating process

february 2008 by cshalizi

"In regression with random design, we study the problem of selecting a model that performs well for out-of-sample prediction. We do not assume that any of the candidate models under consideration are correct. Our analysis is based on explicit finite-sample results. Our main findings differ from those of other analyses that are based on traditional large-sample limit approximations because we consider a situation where the sample size is small relative to the complexity of the data-generating process, in the sense that the number of parameters in a `good' model is of the same order as sample size. Also, we allow for the case where the number of candidate models is (much) larger than sample size."

linear_regression
learning_theory
model_selection
to_teach:undergrad-ADA
statistics
february 2008 by cshalizi

Fast and robust bootstrap

february 2008 by cshalizi

"recent developments on a bootstrap method for robust estimators which is computationally faster and more resistant to outliers than the classical bootstrap"

bootstrap
statistics
linear_regression
february 2008 by cshalizi

interaction models « orgtheory.net

october 2007 by cshalizi

Good - but why use a linear regression model in the first place? (Memo to self: write teaching note about testing for interactions with nonparametric regressions)

linear_regression
social_science_statistics
methodological_advice
to:blog
statistical_interaction
october 2007 by cshalizi

**related tags**

Copy this bookmark: