**approximation**651

[1612.07545] A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search

11 days ago by cshalizi

"Approximate Nearest Neighbor Search (ANNS) is a fundamental problem in many areas of machine learning and data mining. During the past decade, numerous hashing algorithms are proposed to solve this problem. Every proposed algorithm claims outperform other state-of-the-art hashing methods. However, the evaluation of these hashing papers was not thorough enough, and those claims should be re-examined. The ultimate goal of an ANNS method is returning the most accurate answers (nearest neighbors) in the shortest time. If implemented correctly, almost all the hashing methods will have their performance improved as the code length increases. However, many existing hashing papers only report the performance with the code length shorter than 128. In this paper, we carefully revisit the problem of search with a hash index, and analyze the pros and cons of two popular hash index search procedures. Then we proposed a very simple but effective two level index structures and make a thorough comparison of eleven popular hashing algorithms. Surprisingly, the random-projection-based Locality Sensitive Hashing (LSH) is the best performed algorithm, which is in contradiction to the claims in all the other ten hashing papers. Despite the extreme simplicity of random-projection-based LSH, our results show that the capability of this algorithm has been far underestimated. For the sake of reproducibility, all the codes used in the paper are released on GitHub, which can be used as a testing platform for a fair comparison between various hashing algorithms."

to:NB
data_mining
approximation
nearest_neighbors
locality-sensitive_hashing
hashing
have_read
via:vaguery
random_projections
k-means
databases
11 days ago by cshalizi

[1612.07545] A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search

11 days ago by Vaguery

Approximate Nearest Neighbor Search (ANNS) is a fundamental problem in many areas of machine learning and data mining. During the past decade, numerous hashing algorithms are proposed to solve this problem. Every proposed algorithm claims outperform other state-of-the-art hashing methods. However, the evaluation of these hashing papers was not thorough enough, and those claims should be re-examined. The ultimate goal of an ANNS method is returning the most accurate answers (nearest neighbors) in the shortest time. If implemented correctly, almost all the hashing methods will have their performance improved as the code length increases. However, many existing hashing papers only report the performance with the code length shorter than 128. In this paper, we carefully revisit the problem of search with a hash index, and analyze the pros and cons of two popular hash index search procedures. Then we proposed a very simple but effective two level index structures and make a thorough comparison of eleven popular hashing algorithms. Surprisingly, the random-projection-based Locality Sensitive Hashing (LSH) is the best performed algorithm, which is in contradiction to the claims in all the other ten hashing papers. Despite the extreme simplicity of random-projection-based LSH, our results show that the capability of this algorithm has been far underestimated. For the sake of reproducibility, all the codes used in the paper are released on GitHub, which can be used as a testing platform for a fair comparison between various hashing algorithms.

hashing
algorithms
approximation
dimension-reduction
representation
data-analysis
feature-extraction
nudge-targets
consider:looking-to-see
to-write-about
11 days ago by Vaguery

[1712.08911] Largest and Smallest Area Triangles on Imprecise Points

27 days ago by Vaguery

Assume we are given a set of parallel line segments in the plane, and we wish to place a point on each line segment such that the resulting point set maximizes or minimizes the area of the largest or smallest triangle in the set. We analyze the complexity of the four resulting computational problems, and we show that three of them admit polynomial-time algorithms, while the fourth is NP-hard. Specifically, we show that maximizing the largest triangle can be done in O(n2) time (or in O(nlogn) time for unit segments); minimizing the largest triangle can be done in O(n2logn) time; maximizing the smallest triangle is NP-hard; but minimizing the smallest triangle can be done in O(n2) time. We also discuss to what extent our results can be generalized to polygons with k>3 sides.

computational-geometry
rather-interesting
approximation
plane-geometry
optimization
to-write-about
nudge-targets
consider:looking-to-see
27 days ago by Vaguery

[1807.09885] A Polynomial Time Constant Approximation For Minimizing Total Weighted Flow-time

4 weeks ago by Vaguery

We consider the classic scheduling problem of minimizing the total weighted flow-time on a single machine (min-WPFT), when preemption is allowed. In this problem, we are given a set of n jobs, each job having a release time rj, a processing time pj, and a weight wj. The flow-time of a job is defined as the amount of time the job spends in the system before it completes; that is, Fj=Cj−rj, where Cj is the completion time of job. The objective is to minimize the total weighted flow-time of jobs.

This NP-hard problem has been studied quite extensively for decades. In a recent breakthrough, Batra, Garg, and Kumar presented a {\em pseudo-polynomial} time algorithm that has an O(1) approximation ratio. The design of a truly polynomial time algorithm, however, remained an open problem. In this paper, we show a transformation from pseudo-polynomial time algorithms to polynomial time algorithms in the context of min-WPFT. Our result combined with the result of Batra, Garg, and Kumar settles the long standing conjecture that there is a polynomial time algorithm with O(1)-approximation for min-WPFT.

approximation
mathematical-programming
optimization
computational-complexity
NP-hard
heuristics
rather-interesting
to-write-about
This NP-hard problem has been studied quite extensively for decades. In a recent breakthrough, Batra, Garg, and Kumar presented a {\em pseudo-polynomial} time algorithm that has an O(1) approximation ratio. The design of a truly polynomial time algorithm, however, remained an open problem. In this paper, we show a transformation from pseudo-polynomial time algorithms to polynomial time algorithms in the context of min-WPFT. Our result combined with the result of Batra, Garg, and Kumar settles the long standing conjecture that there is a polynomial time algorithm with O(1)-approximation for min-WPFT.

4 weeks ago by Vaguery

[1806.04484] A Fourier-Analytic Approach for the Discrepancy of Random Set Systems

4 weeks ago by Vaguery

One of the prominent open problems in combinatorics is the discrepancy of set systems where each element lies in at most t sets. The Beck-Fiala conjecture suggests that the right bound is O(t√), but for three decades the only known bound not depending on the size of the set system has been O(t). Arguably we currently lack techniques for breaking that barrier.

In this paper we introduce discrepancy bounds based on Fourier analysis. We demonstrate our method on random set systems. Suppose one has n elements and m sets containing each element independently with probability p. We prove that in the regime of n≥Θ(m2log(m)), the discrepancy is at most 1 with high probability. Previously, a result of Ezra and Lovett gave a bound of O(1) under the stricter assumption that n≫mt.

combinatorics
open-questions
probability-theory
approximation
limits
hypergraphs
to-understand
consider:looking-to-see
In this paper we introduce discrepancy bounds based on Fourier analysis. We demonstrate our method on random set systems. Suppose one has n elements and m sets containing each element independently with probability p. We prove that in the regime of n≥Θ(m2log(m)), the discrepancy is at most 1 with high probability. Previously, a result of Ezra and Lovett gave a bound of O(1) under the stricter assumption that n≫mt.

4 weeks ago by Vaguery

Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

5 weeks ago by cshalizi

"The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a given high-dimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case. We map the parameters of a neural network to a system of particles relaxing with an interaction potential determined by the loss function. We show that in the limit that the number of parameters n is large, the landscape of the mean-squared error becomes convex and the representation error in the function scales as O(n−1). In this limit, we prove a dynamical variant of the universal approximation theorem showing that the optimal representation can be attained by stochastic gradient descent, the algorithm ubiquitously used for parameter optimization in machine learning. In the asymptotic regime, we study the fluctuations around the optimal representation and show that they arise at a scale O(n−1). These fluctuations in the landscape identify the natural scale for the noise in stochastic gradient descent. Our results apply to both single and multi-layer neural networks, as well as standard kernel methods like radial basis functions."

!!!

to:NB
to_read
neural_networks
approximation
learning_theory
via:rvenkat
!!!

5 weeks ago by cshalizi

[1807.10489] Randomized residual-based error estimators for parametrized equations

10 weeks ago by Vaguery

We propose a randomized a posteriori error estimator for reduced order approximations of parametrized (partial) differential equations. The error estimator has several important properties: the effectivity is close to unity with prescribed lower and upper bounds at specified high probability; the estimator does not require the calculation of stability (coercivity, or inf-sup) constants; the online cost to evaluate the a posteriori error estimator is commensurate with the cost to find the reduced order approximation; the probabilistic bounds extend to many queries with only modest increase in cost. To build this estimator, we first estimate the norm of the error with a Monte-Carlo estimator using Gaussian random vectors whose covariance is chosen according to the desired error measure, e.g. user-defined norms or quantity of interest. Then, we introduce a dual problem with random right-hand side the solution of which allows us to rewrite the error estimator in terms of the residual of the original equation. In order to have a fast-to-evaluate estimator, model order reduction methods can be used to approximate the random dual solutions. Here, we propose a greedy algorithm that is guided by a scalar quantity of interest depending on the error estimator. Numerical experiments on a multi-parametric Helmholtz problem demonstrate that this strategy yields rather low-dimensional reduced dual spaces.

numerical-methods
approximation
rather-interesting
to-understand
modeling
algorithms
to-write-about
10 weeks ago by Vaguery

Cell Biology by the Numbers

12 weeks ago by arsyed

"Vignettes that reveal how numbers serve as a sixth sense to understanding our cells"

biology
numbers
scale
approximation
12 weeks ago by arsyed

Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning

september 2018 by cshalizi

"Randomized neural networks are immortalized in this AI Koan: In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. What are you doing?'' asked Minsky. I am training a randomly wired neural net to play tic-tac-toe,'' Sussman replied. Why is the net wired randomly?'' asked Minsky. Sussman replied, I do not want it to have any preconceptions of how to play.'' Minsky then shut his eyes. Why do you close your eyes?'' Sussman asked his teacher. So that the room will be empty,'' replied Minsky. At that moment, Sussman was enlightened. We analyze shallow random networks with the help of concentration of measure inequalities. Specifically, we consider architectures that compute a weighted sum of their inputs after passing them through a bank of arbitrary randomized nonlinearities. We identify conditions under which these networks exhibit good classification performance, and bound their test error in terms of the size of the dataset and the number of random nonlinearities."

--- Have I never bookmarked this before?

in_NB
approximation
kernel_methods
random_projections
statistics
prediction
classifiers
rahimi.ali
recht.benjamin
machine_learning
have_read
--- Have I never bookmarked this before?

september 2018 by cshalizi

[1808.05587] Deep Convolutional Networks as shallow Gaussian Processes

august 2018 by Vaguery

We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many convolutional filters, extending similar results for dense networks. For a CNN, the equivalent kernel can be computed exactly and, unlike "deep kernels", has very few parameters: only the hyperparameters of the original CNN. Further, we show that this kernel has two properties that allow it to be computed efficiently; the cost of evaluating the kernel for a pair of images is similar to a single forward pass through the original CNN with only one filter per layer. The kernel equivalent to a 32-layer ResNet obtains 0.84% classification error on MNIST, a new record for GPs with a comparable number of parameters.

via:cshalizi
neural-networks
approximation
rather-interesting
representation
deep-learning
algorithms
to-write-about
equivalences-of-motion-over-different-paths
august 2018 by Vaguery

Lattice Boltzmann methods - Wikipedia

august 2018 by schahn

"Lattice Boltzmann methods (LBM) ... is a class of computational fluid dynamics (CFD) methods for fluid simulation. " Successor to lattice gas automaton methods.

article
algorithm
approximation
fluid
mechanics
august 2018 by schahn

[1806.06850] Polynomial Regression As an Alternative to Neural Nets

july 2018 by cshalizi

"Despite the success of neural networks (NNs), there is still a concern among many over their "black box" nature. Why do they work? Here we present a simple analytic argument that NNs are in fact essentially polynomial regression models. This view will have various implications for NNs, e.g. providing an explanation for why convergence problems arise in NNs, and it gives rough guidance on avoiding overfitting. In addition, we use this phenomenon to predict and confirm a multicollinearity property of NNs not previously reported in the literature. Most importantly, given this loose correspondence, one may choose to routinely use polynomial models instead of NNs, thus avoiding some major problems of the latter, such as having to set many tuning parameters and dealing with convergence issues. We present a number of empirical results; in each case, the accuracy of the polynomial approach matches or exceeds that of NN approaches. A many-featured, open-source software package, polyreg, is available."

--- Matloff is the author of my favorite "R programming for n00bs" textbook...

--- ETA after reading: the argument that multi-layer neural networks "are essentially" polynomial regression is a bit weak. It would be true, exactly, if activation functions were exactly polynomial, which however they rarely are in practice. If non-polynomial activations happen to be implemented in computational practice by polynomials (e.g., Taylor approximations), well, either we get different hardware or we crank up the degree of approximation as much as we like. (Said a little differently, if you buy this line of argument, you should buy that _every_ smooth statistical model "is essentially" polynomial regression, which seems a bit much.) It is, also, an argument about the function-approximation properties of the model classes, and not the fitting processes, despite the explicit disclaimers.

to:NB
your_favorite_deep_neural_network_sucks
regression
neural_networks
statistics
matloff.norman
approximation
computational_statistics
have_read
--- Matloff is the author of my favorite "R programming for n00bs" textbook...

--- ETA after reading: the argument that multi-layer neural networks "are essentially" polynomial regression is a bit weak. It would be true, exactly, if activation functions were exactly polynomial, which however they rarely are in practice. If non-polynomial activations happen to be implemented in computational practice by polynomials (e.g., Taylor approximations), well, either we get different hardware or we crank up the degree of approximation as much as we like. (Said a little differently, if you buy this line of argument, you should buy that _every_ smooth statistical model "is essentially" polynomial regression, which seems a bit much.) It is, also, an argument about the function-approximation properties of the model classes, and not the fitting processes, despite the explicit disclaimers.

july 2018 by cshalizi

**related tags**

Copy this bookmark: