cshalizi + to_teach:statcomp + optimization   12

Optimization Models | Cambridge University Press
"Emphasizing practical understanding over the technicalities of specific algorithms, this elegant textbook is an accessible introduction to the field of optimization, focusing on powerful and reliable convex optimization techniques. Students and practitioners will learn how to recognize, simplify, model and solve optimization problems - and apply these principles to their own projects. A clear and self-contained introduction to linear algebra demonstrates core mathematical concepts in a way that is easy to follow, and helps students to understand their practical relevance. Requiring only a basic understanding of geometry, calculus, probability and statistics, and striking a careful balance between accessibility and rigor, it enables students to quickly understand the material, without being overwhelmed by complex mathematics. Accompanied by numerous end-of-chapter problems, an online solutions manual for instructors, and relevant examples from diverse fields including engineering, data science, economics, finance, and management, this is the perfect introduction to optimization for undergraduate and graduate students."
in_NB  optimization  convexity  books:noted  to_teach:statcomp  to_teach:freshman_seminar_on_optimization 
october 2014 by cshalizi
CRAN - Package alabama
"Augmented Lagrangian Adaptive Barrier Minimization Algorithm for optimizing smooth nonlinear objective functions with constraints. Linear or nonlinear equality and inequality constraints are allowed."

- Well, that solved my problem.
optimization  R  to_teach:statcomp 
december 2013 by cshalizi
[1310.2059] Distributed Coordinate Descent Method for Learning with Big Data
"In this paper we develop and analyze Hydra: HYbriD cooRdinAte descent method for solving loss minimization problems with big data. We initially partition the coordinates (features) and assign each partition to a different node of a cluster. At every iteration, each node picks a random subset of the coordinates from those it owns, independently from the other computers, and in parallel computes and applies updates to the selected coordinates based on a simple closed-form formula. We give bounds on the number of iterations sufficient to approximately solve the problem with high probability, and show how it depends on the data and on the partitioning. We perform numerical experiments with a LASSO instance described by a 3TB matrix."
to:NB  optimization  high-dimensional_statistics  computational_statistics  statistics  lasso  to_teach:statcomp 
october 2013 by cshalizi
Convex Optimization in R
"Convex optimization now plays an essential role in many facets of statistics. We briefly survey some recent developments and describe some implementations of some methods in R."

- Really, there's that little support for semi-definite programming?
to:NB  optimization  convexity  statistics  to_teach:statcomp  have_read  re:small-area_estimation_by_smoothing 
july 2013 by cshalizi
[1306.3574] Early stopping and non-parametric regression: An optimal data-dependent stopping rule
"The strategy of early stopping is a regularization technique based on choosing a stopping time for an iterative algorithm. Focusing on non-parametric regression in a reproducing kernel Hilbert space, we analyze the early stopping strategy for a form of gradient-descent applied to the least-squares loss function. We propose a data-dependent stopping rule that does not involve hold-out or cross-validation data, and we prove upper bounds on the squared error of the resulting function estimate, measured in either the $L^2(P)$ and $L^2(P_n)$ norm. These upper bounds lead to minimax-optimal rates for various kernel classes, including Sobolev smoothness classes and other forms of reproducing kernel Hilbert spaces. We show through simulation that our stopping rule compares favorably to two other stopping rules, one based on hold-out data and the other based on Stein's unbiased risk estimate. We also establish a tight connection between our early stopping strategy and the solution path of a kernel ridge regression estimator."
in_NB  optimization  kernel_estimators  hilbert_space  nonparametrics  regression  minimax  yu.bin  wainwright.martin_j.  to_teach:statcomp  have_read 
june 2013 by cshalizi
[1306.2119] Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)
"We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk. We focus on problems without strong convexity, for which all previously known algorithms achieve a convergence rate for function values of O(1/n^{1/2}). We consider and analyze two algorithms that achieve a rate of O(1/n) for classical supervised learning problems. For least-squares regression, we show that averaged stochastic gradient descent with constant step-size achieves the desired rate. For logistic regression, this is achieved by a simple novel stochastic gradient algorithm that (a) constructs successive local quadratic approximations of the loss functions, while (b) preserving the same running time complexity as stochastic gradient descent. For these algorithms, we provide a non-asymptotic analysis of the generalization error (in expectation, and also in high probability for least-squares), and run extensive experiments on standard machine learning benchmarks showing that they often outperform existing approaches."
in_NB  optimization  learning_theory  statistics  estimation  stochastic_approximation  to_teach:statcomp 
june 2013 by cshalizi
[1306.1840] Loss-Proportional Subsampling for Subsequent ERM
"We propose a sampling scheme suitable for reducing a data set prior to selecting a hypothesis with minimum empirical risk. The sampling only considers a subset of the ultimate (unknown) hypothesis set, but can nonetheless guarantee that the final excess risk will compare favorably with utilizing the entire original data set. We demonstrate the practical benefits of our approach on a large dataset which we subsample and subsequently fit with boosted trees."

- To_teach is speculative. The trick is to pick some easy-to-compute hypothesis which can be applied to the whole data set, preferentially sample the points with high loss under this pilot model, and then do importance weighting. The excess risk they get for a sub-sample of size m from n observations is O(n^{-0.5}) + O(m^{-0.75}), as opposed to O(m^{-0.5}) for just naively drawing a sub-sample. I don't think they ever compare this to e.g. stochastic gradient descent.
in_NB  estimation  optimization  computational_statistics  statistics  learning_theory  to_teach:statcomp  have_read 
june 2013 by cshalizi
[1212.4174] Feature Clustering for Accelerating Parallel Coordinate Descent
"Large-scale L1-regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classification and regression problems. High-performance algorithms and implementations are critical to efficiently solving these problems. Building upon previous work on coordinate descent algorithms for L1-regularized problems, we introduce a novel family of algorithms called block-greedy coordinate descent that includes, as special cases, several existing algorithms such as SCD, Greedy CD, Shotgun, and Thread-Greedy. We give a unified convergence analysis for the family of block-greedy algorithms. The analysis suggests that block-greedy coordinate descent can better exploit parallelism if features are clustered so that the maximum inner product between features in different blocks is small. Our theoretical convergence analysis is supported with experimental re- sults using data from diverse real-world applications. We hope that algorithmic approaches and convergence analysis we provide will not only advance the field, but will also encourage researchers to systematically explore the design space of algorithms for solving large-scale L1-regularization problems."
to:NB  optimization  lasso  regression  machine_learning  to_teach:statcomp 
december 2012 by cshalizi
On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization
"It is shown that the steepest descent and Newton’s method for unconstrained nonconvex optimization under standard assumptions may be both require a number of iterations and function evaluations arbitrarily close to O(epsilon^−2) to drive the norm of the gradient below epsilon. This shows that the upper bound of O(epsilon^−2) evaluations known for the steepest descent is tight, and that Newton’s method may be as slow as steepest descent in the worst case. The improved evaluation complexity bound of O(epsilon^−3/2) evaluations known for cubically-regularised Newton methods is also shown to be tight."

- Not really "to teach", but background...
in_NB  optimization  to_teach:statcomp 
october 2012 by cshalizi
[1206.4602] Quasi-Newton Methods: A New Direction
"Four decades after their invention, quasi-Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors."

--- Final conference version: http://jmlr.csail.mit.edu/papers/v14/hennig13a.html
in_NB  optimization  machine_learning  learning_theory  to_teach:statcomp 
june 2012 by cshalizi
Introduction to Online Optimization (Bubeck)
"to_teach" tag a sudden brainstorm for how to make next year's statistical computing class either unbeatably awesome or an absolute disaster
in_NB  online_learning  regression  individual_sequence_prediction  optimization  machine_learning  learning_theory  via:mraginsky  to_read  to_teach:statcomp  re:freshman_seminar_on_optimization 
december 2011 by cshalizi

Copy this bookmark: