**Vaguery + approximation**
257

[1612.07545] A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search

10 days ago by Vaguery

Approximate Nearest Neighbor Search (ANNS) is a fundamental problem in many areas of machine learning and data mining. During the past decade, numerous hashing algorithms are proposed to solve this problem. Every proposed algorithm claims outperform other state-of-the-art hashing methods. However, the evaluation of these hashing papers was not thorough enough, and those claims should be re-examined. The ultimate goal of an ANNS method is returning the most accurate answers (nearest neighbors) in the shortest time. If implemented correctly, almost all the hashing methods will have their performance improved as the code length increases. However, many existing hashing papers only report the performance with the code length shorter than 128. In this paper, we carefully revisit the problem of search with a hash index, and analyze the pros and cons of two popular hash index search procedures. Then we proposed a very simple but effective two level index structures and make a thorough comparison of eleven popular hashing algorithms. Surprisingly, the random-projection-based Locality Sensitive Hashing (LSH) is the best performed algorithm, which is in contradiction to the claims in all the other ten hashing papers. Despite the extreme simplicity of random-projection-based LSH, our results show that the capability of this algorithm has been far underestimated. For the sake of reproducibility, all the codes used in the paper are released on GitHub, which can be used as a testing platform for a fair comparison between various hashing algorithms.

hashing
algorithms
approximation
dimension-reduction
representation
data-analysis
feature-extraction
nudge-targets
consider:looking-to-see
to-write-about
10 days ago by Vaguery

[1712.08911] Largest and Smallest Area Triangles on Imprecise Points

27 days ago by Vaguery

Assume we are given a set of parallel line segments in the plane, and we wish to place a point on each line segment such that the resulting point set maximizes or minimizes the area of the largest or smallest triangle in the set. We analyze the complexity of the four resulting computational problems, and we show that three of them admit polynomial-time algorithms, while the fourth is NP-hard. Specifically, we show that maximizing the largest triangle can be done in O(n2) time (or in O(nlogn) time for unit segments); minimizing the largest triangle can be done in O(n2logn) time; maximizing the smallest triangle is NP-hard; but minimizing the smallest triangle can be done in O(n2) time. We also discuss to what extent our results can be generalized to polygons with k>3 sides.

computational-geometry
rather-interesting
approximation
plane-geometry
optimization
to-write-about
nudge-targets
consider:looking-to-see
27 days ago by Vaguery

[1807.09885] A Polynomial Time Constant Approximation For Minimizing Total Weighted Flow-time

29 days ago by Vaguery

We consider the classic scheduling problem of minimizing the total weighted flow-time on a single machine (min-WPFT), when preemption is allowed. In this problem, we are given a set of n jobs, each job having a release time rj, a processing time pj, and a weight wj. The flow-time of a job is defined as the amount of time the job spends in the system before it completes; that is, Fj=Cj−rj, where Cj is the completion time of job. The objective is to minimize the total weighted flow-time of jobs.

This NP-hard problem has been studied quite extensively for decades. In a recent breakthrough, Batra, Garg, and Kumar presented a {\em pseudo-polynomial} time algorithm that has an O(1) approximation ratio. The design of a truly polynomial time algorithm, however, remained an open problem. In this paper, we show a transformation from pseudo-polynomial time algorithms to polynomial time algorithms in the context of min-WPFT. Our result combined with the result of Batra, Garg, and Kumar settles the long standing conjecture that there is a polynomial time algorithm with O(1)-approximation for min-WPFT.

approximation
mathematical-programming
optimization
computational-complexity
NP-hard
heuristics
rather-interesting
to-write-about
This NP-hard problem has been studied quite extensively for decades. In a recent breakthrough, Batra, Garg, and Kumar presented a {\em pseudo-polynomial} time algorithm that has an O(1) approximation ratio. The design of a truly polynomial time algorithm, however, remained an open problem. In this paper, we show a transformation from pseudo-polynomial time algorithms to polynomial time algorithms in the context of min-WPFT. Our result combined with the result of Batra, Garg, and Kumar settles the long standing conjecture that there is a polynomial time algorithm with O(1)-approximation for min-WPFT.

29 days ago by Vaguery

[1806.04484] A Fourier-Analytic Approach for the Discrepancy of Random Set Systems

4 weeks ago by Vaguery

One of the prominent open problems in combinatorics is the discrepancy of set systems where each element lies in at most t sets. The Beck-Fiala conjecture suggests that the right bound is O(t√), but for three decades the only known bound not depending on the size of the set system has been O(t). Arguably we currently lack techniques for breaking that barrier.

In this paper we introduce discrepancy bounds based on Fourier analysis. We demonstrate our method on random set systems. Suppose one has n elements and m sets containing each element independently with probability p. We prove that in the regime of n≥Θ(m2log(m)), the discrepancy is at most 1 with high probability. Previously, a result of Ezra and Lovett gave a bound of O(1) under the stricter assumption that n≫mt.

combinatorics
open-questions
probability-theory
approximation
limits
hypergraphs
to-understand
consider:looking-to-see
In this paper we introduce discrepancy bounds based on Fourier analysis. We demonstrate our method on random set systems. Suppose one has n elements and m sets containing each element independently with probability p. We prove that in the regime of n≥Θ(m2log(m)), the discrepancy is at most 1 with high probability. Previously, a result of Ezra and Lovett gave a bound of O(1) under the stricter assumption that n≫mt.

4 weeks ago by Vaguery

[1807.10489] Randomized residual-based error estimators for parametrized equations

10 weeks ago by Vaguery

We propose a randomized a posteriori error estimator for reduced order approximations of parametrized (partial) differential equations. The error estimator has several important properties: the effectivity is close to unity with prescribed lower and upper bounds at specified high probability; the estimator does not require the calculation of stability (coercivity, or inf-sup) constants; the online cost to evaluate the a posteriori error estimator is commensurate with the cost to find the reduced order approximation; the probabilistic bounds extend to many queries with only modest increase in cost. To build this estimator, we first estimate the norm of the error with a Monte-Carlo estimator using Gaussian random vectors whose covariance is chosen according to the desired error measure, e.g. user-defined norms or quantity of interest. Then, we introduce a dual problem with random right-hand side the solution of which allows us to rewrite the error estimator in terms of the residual of the original equation. In order to have a fast-to-evaluate estimator, model order reduction methods can be used to approximate the random dual solutions. Here, we propose a greedy algorithm that is guided by a scalar quantity of interest depending on the error estimator. Numerical experiments on a multi-parametric Helmholtz problem demonstrate that this strategy yields rather low-dimensional reduced dual spaces.

numerical-methods
approximation
rather-interesting
to-understand
modeling
algorithms
to-write-about
10 weeks ago by Vaguery

[1808.05587] Deep Convolutional Networks as shallow Gaussian Processes

august 2018 by Vaguery

We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many convolutional filters, extending similar results for dense networks. For a CNN, the equivalent kernel can be computed exactly and, unlike "deep kernels", has very few parameters: only the hyperparameters of the original CNN. Further, we show that this kernel has two properties that allow it to be computed efficiently; the cost of evaluating the kernel for a pair of images is similar to a single forward pass through the original CNN with only one filter per layer. The kernel equivalent to a 32-layer ResNet obtains 0.84% classification error on MNIST, a new record for GPs with a comparable number of parameters.

via:cshalizi
neural-networks
approximation
rather-interesting
representation
deep-learning
algorithms
to-write-about
equivalences-of-motion-over-different-paths
august 2018 by Vaguery

[1806.10909] ResNet with one-neuron hidden layers is a Universal Approximator

june 2018 by Vaguery

We demonstrate that a very deep ResNet with stacked modules with one neuron per hidden layer and ReLU activation functions can uniformly approximate any Lebesgue integrable function in d dimensions, i.e. ℓ1(ℝd). Because of the identity mapping inherent to ResNets, our network has alternating layers of dimension one and d. This stands in sharp contrast to fully connected networks, which are not universal approximators if their width is the input dimension d [Lu et al, 2017]. Hence, our result implies an increase in representational power for narrow deep networks by the ResNet architecture.

representation
neural-networks
approximation
rather-interesting
to-write-about
to-do
june 2018 by Vaguery

[1712.08373] Notes on complexity of packing coloring

march 2018 by Vaguery

A packing k-coloring for some integer k of a graph G=(V,E) is a mapping

φ:V→{1,…,k} such that any two vertices u,v of color φ(u)=φ(v) are in distance at least φ(u)+1. This concept is motivated by frequency assignment problems. The \emph{packing chromatic number} of G is the smallest k such that there exists a packing k-coloring of G.

Fiala and Golovach showed that determining the packing chromatic number for chordal graphs is \NP-complete for diameter exactly 5. While the problem is easy to solve for diameter 2, we show \NP-completeness for any diameter at least 3. Our reduction also shows that the packing chromatic number is hard to approximate within n1/2−ε for any ε>0.

In addition, we design an \FPT algorithm for interval graphs of bounded diameter. This leads us to exploring the problem of finding a partial coloring that maximizes the number of colored vertices.

graph-theory
algorithms
combinatorics
proof
approximation
nudge-targets
consider:looking-to-see
consider:feature-discovery
φ:V→{1,…,k} such that any two vertices u,v of color φ(u)=φ(v) are in distance at least φ(u)+1. This concept is motivated by frequency assignment problems. The \emph{packing chromatic number} of G is the smallest k such that there exists a packing k-coloring of G.

Fiala and Golovach showed that determining the packing chromatic number for chordal graphs is \NP-complete for diameter exactly 5. While the problem is easy to solve for diameter 2, we show \NP-completeness for any diameter at least 3. Our reduction also shows that the packing chromatic number is hard to approximate within n1/2−ε for any ε>0.

In addition, we design an \FPT algorithm for interval graphs of bounded diameter. This leads us to exploring the problem of finding a partial coloring that maximizes the number of colored vertices.

march 2018 by Vaguery

[1801.00548] A Machine Learning Approach to Adaptive Covariance Localization

march 2018 by Vaguery

Data assimilation plays a key role in large-scale atmospheric weather forecasting, where the state of the physical system is estimated from model outputs and observations, and is then used as initial condition to produce accurate future forecasts. The Ensemble Kalman Filter (EnKF) provides a practical implementation of the statistical solution of the data assimilation problem and has gained wide popularity as. This success can be attributed to its simple formulation and ease of implementation. EnKF is a Monte-Carlo algorithm that solves the data assimilation problem by sampling the probability distributions involved in Bayes theorem. Because of this, all flavors of EnKF are fundamentally prone to sampling errors when the ensemble size is small. In typical weather forecasting applications, the model state space has dimension 109−1012, while the ensemble size typically ranges between 30−100 members. Sampling errors manifest themselves as long-range spurious correlations and have been shown to cause filter divergence. To alleviate this effect covariance localization dampens spurious correlations between state variables located at a large distance in the physical space, via an empirical distance-dependent function. The quality of the resulting analysis and forecast is greatly influenced by the choice of the localization function parameters, e.g., the radius of influence. The localization radius is generally tuned empirically to yield desirable results.This work, proposes two adaptive algorithms for covariance localization in the EnKF framework, both based on a machine learning approach. The first algorithm adapts the localization radius in time, while the second algorithm tunes the localization radius in both time and space. Numerical experiments carried out with the Lorenz-96 model, and a quasi-geostrophic model, reveal the potential of the proposed machine learning approaches.

modeling
machine-learning
prediction
rather-interesting
looking-to-see
approximation
algorithms
to-write-about
march 2018 by Vaguery

[1709.08004] Slow-scale split-step tau-leap method for stiff stochastic chemical systems

february 2018 by Vaguery

Tau-leaping is a family of algorithms for the approximate simulation of discrete state continuous time Markov chains. The motivation for the development of such methods can be found, for instance, in the fields of chemical kinetics and systems biology. It is well known that the dynamical behavior of biochemical systems is often intrinsically stiff representing a serious challenge for their numerical approximation. The naive extension of stiff deterministic solvers to stochastic integration usually yields numerical solutions with either impractically large relaxation times or incorrectly resolved covariance. In this paper, we propose a novel splitting heuristic which allows to resolve these issues. The proposed numerical integrator takes advantage of the special structure of the linear systems with explicitly available equations for the mean and the covariance which we use to calibrate the parameters of the scheme. It is shown that the method is able to reproduce the exact mean and variance of the linear scalar test equation and has very good accuracy for the arbitrarily stiff systems at least in linear case. The numerical examples for both linear and nonlinear systems are also provided and the obtained results confirm the efficiency of the considered splitting approach.

numerical-methods
diffy-Qs
approximation
systems-biology
algorithms
to-write-about
consider:representation
rather-interesting
nudge
february 2018 by Vaguery

A few shades of interpolation

february 2018 by Vaguery

The topic of this snapshot is interpolation. In the

ordinary sense, interpolation means to insert something

of a different nature into something else. In

mathematics, interpolation means constructing new

data points from given data points. The new points

usually lie in between the already-known points. The

purpose of this snapshot is to introduce a particular

type of interpolation, namely, polynomial interpolation.

This will be explained starting from basic ideas

that go back to the ancient Babylonians and Greeks,

and will arrive at subjects of current research activity.

interpolation
rather-interesting
history-of-science
algorithms
approximation
to-write-about
ordinary sense, interpolation means to insert something

of a different nature into something else. In

mathematics, interpolation means constructing new

data points from given data points. The new points

usually lie in between the already-known points. The

purpose of this snapshot is to introduce a particular

type of interpolation, namely, polynomial interpolation.

This will be explained starting from basic ideas

that go back to the ancient Babylonians and Greeks,

and will arrive at subjects of current research activity.

february 2018 by Vaguery

[1709.10030] Sparse Hierarchical Regression with Polynomials

february 2018 by Vaguery

We present a novel method for exact hierarchical sparse polynomial regression. Our regressor is that degree r polynomial which depends on at most k inputs, counting at most ℓ monomial terms, which minimizes the sum of the squares of its prediction errors. The previous hierarchical sparse specification aligns well with modern big data settings where many inputs are not relevant for prediction purposes and the functional complexity of the regressor needs to be controlled as to avoid overfitting. We present a two-step approach to this hierarchical sparse regression problem. First, we discard irrelevant inputs using an extremely fast input ranking heuristic. Secondly, we take advantage of modern cutting plane methods for integer optimization to solve our resulting reduced hierarchical (k,ℓ)-sparse problem exactly. The ability of our method to identify all k relevant inputs and all ℓ monomial terms is shown empirically to experience a phase transition. Crucially, the same transition also presents itself in our ability to reject all irrelevant features and monomials as well. In the regime where our method is statistically powerful, its computational complexity is interestingly on par with Lasso based heuristics. The presented work fills a void in terms of a lack of powerful disciplined nonlinear sparse regression methods in high-dimensional settings. Our method is shown empirically to scale to regression problems with n≈10,000 observations for input dimension p≈1,000.

statistics
regression
algorithms
approximation
performance-measure
to-understand
nudge-targets
consider:looking-to-see
february 2018 by Vaguery

[1712.08558] Lattice-based Locality Sensitive Hashing is Optimal

january 2018 by Vaguery

Locality sensitive hashing (LSH) was introduced by Indyk and Motwani (STOC `98) to give the first sublinear time algorithm for the c-approximate nearest neighbor (ANN) problem using only polynomial space. At a high level, an LSH family hashes "nearby" points to the same bucket and "far away" points to different buckets. The quality of measure of an LSH family is its LSH exponent, which helps determine both query time and space usage.

In a seminal work, Andoni and Indyk (FOCS `06) constructed an LSH family based on random ball partitioning of space that achieves an LSH exponent of 1/c^2 for the l_2 norm, which was later shown to be optimal by Motwani, Naor and Panigrahy (SIDMA `07) and O'Donnell, Wu and Zhou (TOCT `14). Although optimal in the LSH exponent, the ball partitioning approach is computationally expensive. So, in the same work, Andoni and Indyk proposed a simpler and more practical hashing scheme based on Euclidean lattices and provided computational results using the 24-dimensional Leech lattice. However, no theoretical analysis of the scheme was given, thus leaving open the question of finding the exponent of lattice based LSH.

In this work, we resolve this question by showing the existence of lattices achieving the optimal LSH exponent of 1/c^2 using techniques from the geometry of numbers. At a more conceptual level, our results show that optimal LSH space partitions can have periodic structure. Understanding the extent to which additional structure can be imposed on these partitions, e.g. to yield low space and query complexity, remains an important open problem.

algorithms
numerical-methods
rather-interesting
approximation
computational-complexity
nudge-targets
consider:looking-to-see
consider:representation
In a seminal work, Andoni and Indyk (FOCS `06) constructed an LSH family based on random ball partitioning of space that achieves an LSH exponent of 1/c^2 for the l_2 norm, which was later shown to be optimal by Motwani, Naor and Panigrahy (SIDMA `07) and O'Donnell, Wu and Zhou (TOCT `14). Although optimal in the LSH exponent, the ball partitioning approach is computationally expensive. So, in the same work, Andoni and Indyk proposed a simpler and more practical hashing scheme based on Euclidean lattices and provided computational results using the 24-dimensional Leech lattice. However, no theoretical analysis of the scheme was given, thus leaving open the question of finding the exponent of lattice based LSH.

In this work, we resolve this question by showing the existence of lattices achieving the optimal LSH exponent of 1/c^2 using techniques from the geometry of numbers. At a more conceptual level, our results show that optimal LSH space partitions can have periodic structure. Understanding the extent to which additional structure can be imposed on these partitions, e.g. to yield low space and query complexity, remains an important open problem.

january 2018 by Vaguery

[1605.04679] Typical Performance of Approximation Algorithms for NP-hard Problems

december 2017 by Vaguery

Typical performance of approximation algorithms is studied for randomized minimum vertex cover problems. A wide class of random graph ensembles characterized by an arbitrary degree distribution is discussed with some theoretical frameworks. Here three approximation algorithms are examined; the linear-programming relaxation, the loopy-belief propagation, and the leaf-removal algorithm. The former two algorithms are analyzed using the statistical-mechanical technique while the average-case analysis of the last one is studied by the generating function method. These algorithms have a threshold in the typical performance with increasing the average degree of the random graph, below which they find true optimal solutions with high probability. Our study reveals that there exist only three cases determined by the order of the typical-performance thresholds. We provide some conditions for classifying the graph ensembles and demonstrate explicitly examples for the difference in the threshold.

approximation
computational-complexity
rather-interesting
to-write-about
to-do
nudge-targets
consider:approximation
december 2017 by Vaguery

[1708.04215] A Constant-Factor Approximation Algorithm for the Asymmetric Traveling Salesman Problem

december 2017 by Vaguery

We give a constant-factor approximation algorithm for the asymmetric traveling salesman problem. Our approximation guarantee is analyzed with respect to the standard LP relaxation, and thus our result confirms the conjectured constant integrality gap of that relaxation.

Our techniques build upon the constant-factor approximation algorithm for the special case of node-weighted metrics. Specifically, we give a generic reduction to structured instances that resemble but are more general than those arising from node-weighted metrics. For those instances, we then solve Local-Connectivity ATSP, a problem known to be equivalent (in terms of constant-factor approximation) to the asymmetric traveling salesman problem.

algorithms
via:vielmetti
operations-research
approximation
optimization
computational-complexity
to-write-about
nudge-targets
consider:looking-to-see
Our techniques build upon the constant-factor approximation algorithm for the special case of node-weighted metrics. Specifically, we give a generic reduction to structured instances that resemble but are more general than those arising from node-weighted metrics. For those instances, we then solve Local-Connectivity ATSP, a problem known to be equivalent (in terms of constant-factor approximation) to the asymmetric traveling salesman problem.

december 2017 by Vaguery

[1711.02724] Algorithms to Approximate Column-Sparse Packing Problems

november 2017 by Vaguery

Column-sparse packing problems arise in several contexts in both deterministic and stochastic discrete optimization. We present two unifying ideas, (non-uniform) attenuation and multiple-chance algorithms, to obtain improved approximation algorithms for some well-known families of such problems. As three main examples, we attain the integrality gap, up to lower-order terms, for known LP relaxations for k-column sparse packing integer programs (Bansal et al., Theory of Computing, 2012) and stochastic k-set packing (Bansal et al., Algorithmica, 2012), and go "half the remaining distance" to optimal for a major integrality-gap conjecture of Furedi, Kahn and Seymour on hypergraph matching (Combinatorica, 1993).

integer-programming
matrices
optimization
operations-research
set-theory
hypergraphs
to-write-about
approximation
algorithms
november 2017 by Vaguery

[1710.00228] Physics-based Motion Planning: Evaluation Criteria and Benchmarking

october 2017 by Vaguery

Motion planning has evolved from coping with simply geometric problems to physics-based ones that incorporate the kinodynamic and the physical constraints imposed by the robot and the physical world. Therefore, the criteria for evaluating physics-based motion planners goes beyond the computational complexity (e.g. in terms of planning time) usually used as a measure for evaluating geometrical planners, in order to consider also the quality of the solution in terms of dynamical parameters. This study proposes an evaluation criteria and analyzes the performance of several kinodynamic planners, which are at the core of physics-based motion planning, using different scenarios with fixed and manipulatable objects. RRT, EST, KPIECE and SyCLoP are used for the benchmarking. The results show that KPIECE computes the time-optimal solution with heighest success rate, whereas, SyCLoP compute the most power-optimal solution among the planners used.

simulation
planning
algorithms
rather-interesting
feature-construction
approximation
nudge-targets
consider:representation
october 2017 by Vaguery

[1705.04587] Complexity and Inapproximability Results for Parallel Task Scheduling and Strip Packing

october 2017 by Vaguery

We study the Parallel Task Scheduling problem Pm|sizej|Cmax with a constant number of machines. This problem is known to be strongly NP-complete for each m≥5, while it is solvable in pseudo-polynomial time for each m≤3. We give a positive answer to the long-standing open question whether this problem is strongly NP-complete for m=4. As a second result, we improve the lower bound of 1211 for approximating pseudo-polynomial Strip Packing to 54. Since the best known approximation algorithm for this problem has a ratio of 43+ε, this result narrows the gap between approximation ratio and inapproximability result by a significant step. Both results are proven by a reduction from the strongly NP-complete problem 3-Partition.

operations-research
optimization
computational-complexity
benchmarking
approximation
rather-interesting
nudge-targets
consider:approximation
october 2017 by Vaguery

[1401.6200] Three Series for the Generalized Golden Mean

october 2017 by Vaguery

As is well-known, the ratio of adjacent Fibonacci numbers tends to phi = (1 + sqrt(5))/2, and the ratio of adjacent Tribonacci numbers (where each term is the sum of the three preceding numbers) tends to the real root eta of X^3 - X^2 - X - 1 = 0. Letting alpha(n) denote the corresponding ratio for the generalized Fibonacci numbers, where each term is the sum of the n preceding, we obtain rapidly converging series for alpha(n), 1/alpha(n), and 1/(2-alpha(n)).

number-theory
approximation
rather-interesting
series
convergence
nudge-targets
consider:evolving-some
consider:performance-measures
consider:convergence
consider:complexity
october 2017 by Vaguery

[1608.08225] Why does deep and cheap learning work so well?

october 2017 by Vaguery

We show how the success of deep learning could depend not only on mathematics but also on physics: although well-known mathematical theorems guarantee that neural networks can approximate arbitrary functions well, the class of functions of practical interest can frequently be approximated through "cheap learning" with exponentially fewer parameters than generic ones. We explore how properties frequently encountered in physics such as symmetry, locality, compositionality, and polynomial log-probability translate into exceptionally simple neural networks. We further argue that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine-learning, a deep neural network can be more efficient than a shallow one. We formalize these claims using information theory and discuss the relation to the renormalization group. We prove various "no-flattening theorems" showing when efficient linear deep networks cannot be accurately approximated by shallow ones without efficiency loss, for example, we show that n variables cannot be multiplied using fewer than 2^n neurons in a single hidden layer.

deep-learning
machine-learning
approximation
looking-to-see
algorithms
representation
rather-interesting
learning-algorithms
information-theory
to-write-about
october 2017 by Vaguery

[1610.07277] Rapid calculation of side chain packing and free energy with applications to protein molecular dynamics

october 2017 by Vaguery

To address the large gap between time scales that can be easily reached by molecular simulations and those required to understand protein dynamics, we propose a rapid self-consistent approximation of the side chain free energy at every integration step. In analogy with the adiabatic Born-Oppenheimer approximation for electronic structure, the protein backbone dynamics are simulated as preceding according to the dictates of the free energy of an instantaneously-equilibrated side chain potential. The side chain free energy is computed on the fly, allowing the protein backbone dynamics to traverse a greatly smoothed energetic landscape. This results in extremely rapid equilibration and sampling of the Boltzmann distribution. Because our method employs a reduced model involving single-bead side chains, we also provide a novel, maximum-likelihood method to parameterize the side chain model using input data from high resolution protein crystal structures. We demonstrate state-of-the-art accuracy for predicting χ1 rotamer states while consuming only milliseconds of CPU time. We also show that the resulting free energies of side chains is sufficiently accurate for de novo folding of some proteins.

structural-biology
approximation
heuristics
algorithms
protein-folding
to-write-about
nudge-targets
october 2017 by Vaguery

[1701.05290] Range-efficient consistent sampling and locality-sensitive hashing for polygons

september 2017 by Vaguery

Locality-sensitive hashing (LSH) is a fundamental technique for similarity search and similarity estimation in high-dimensional spaces. The basic idea is that similar objects should produce hash collisions with probability significantly larger than objects with low similarity. We consider LSH for objects that can be represented as point sets in either one or two dimensions. To make the point sets finite size we consider the subset of points on a grid. Directly applying LSH (e.g. min-wise hashing) to these point sets would require time proportional to the number of points. We seek to achieve time that is much lower than direct approaches.

Technically, we introduce new primitives for range-efficient consistent sampling (of independent interest), and show how to turn such samples into LSH values. Another application of our technique is a data structure for quickly estimating the size of the intersection or union of a set of preprocessed polygons. Curiously, our consistent sampling method uses transformation to a geometric problem.

rather-interesting
approximation
representation
to-understand
nudge
consider:for-behavior-classification
to-write-about
computational-geometry
Technically, we introduce new primitives for range-efficient consistent sampling (of independent interest), and show how to turn such samples into LSH values. Another application of our technique is a data structure for quickly estimating the size of the intersection or union of a set of preprocessed polygons. Curiously, our consistent sampling method uses transformation to a geometric problem.

september 2017 by Vaguery

[1705.01991] Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU

september 2017 by Vaguery

Attentional sequence-to-sequence models have become the new standard for machine translation, but one challenge of such models is a significant increase in training and decoding cost compared to phrase-based systems. Here, we focus on efficient decoding, with a goal of achieving accuracy close the state-of-the-art in neural machine translation (NMT), while achieving CPU decoding speed/throughput close to that of a phrasal decoder.

We approach this problem from two angles: First, we describe several techniques for speeding up an NMT beam search decoder, which obtain a 4.4x speedup over a very efficient baseline decoder without changing the decoder output. Second, we propose a simple but powerful network architecture which uses an RNN (GRU/LSTM) layer at bottom, followed by a series of stacked fully-connected layers applied at every timestep. This architecture achieves similar accuracy to a deep recurrent model, at a small fraction of the training and decoding cost. By combining these techniques, our best system achieves a very competitive accuracy of 38.3 BLEU on WMT English-French NewsTest2014, while decoding at 100 words/sec on single-threaded CPU. We believe this is the best published accuracy/speed trade-off of an NMT system.

machine-learning
representation
algorithms
deep-learning
neural-networks
approximation
nudge-targets
consider:performance-measures
consider:representation
natural-language-processing
We approach this problem from two angles: First, we describe several techniques for speeding up an NMT beam search decoder, which obtain a 4.4x speedup over a very efficient baseline decoder without changing the decoder output. Second, we propose a simple but powerful network architecture which uses an RNN (GRU/LSTM) layer at bottom, followed by a series of stacked fully-connected layers applied at every timestep. This architecture achieves similar accuracy to a deep recurrent model, at a small fraction of the training and decoding cost. By combining these techniques, our best system achieves a very competitive accuracy of 38.3 BLEU on WMT English-French NewsTest2014, while decoding at 100 words/sec on single-threaded CPU. We believe this is the best published accuracy/speed trade-off of an NMT system.

september 2017 by Vaguery

Approximately Yours | bit-player

september 2017 by Vaguery

Today, I’m told, is Rational Approximation Day. It’s 22/7 (for those who write dates in little-endian format), which differs from π by about 0.04 percent. (The big-endians among us are welcome to approximate 1/π.)

Given the present state of life in America, what we really need is an Approximation to Rationality Day, but that may have to wait for 20/1/21. In the meantime, let us merrily fiddle with numbers, searching for ratios of integers that brazenly invade the personal space of famous irrationals.

mathematical-recreations
approximation
continued-fractions
to-write-about
algorithms
nudge-targets
Given the present state of life in America, what we really need is an Approximation to Rationality Day, but that may have to wait for 20/1/21. In the meantime, let us merrily fiddle with numbers, searching for ratios of integers that brazenly invade the personal space of famous irrationals.

september 2017 by Vaguery

[1108.2714] Approximate common divisors via lattices

september 2017 by Vaguery

We analyze the multivariate generalization of Howgrave-Graham's algorithm for the approximate common divisor problem. In the m-variable case with modulus N and approximate common divisor of size N^beta, this improves the size of the error tolerated from N^(beta^2) to N^(beta^((m+1)/m)), under a commonly used heuristic assumption. This gives a more detailed analysis of the hardness assumption underlying the recent fully homomorphic cryptosystem of van Dijk, Gentry, Halevi, and Vaikuntanathan. While these results do not challenge the suggested parameters, a 2^(n^epsilon) approximation algorithm with epsilon<2/3 for lattice basis reduction in n dimensions could be used to break these parameters. We have implemented our algorithm, and it performs better in practice than the theoretical analysis suggests.

Our results fit into a broader context of analogies between cryptanalysis and coding theory. The multivariate approximate common divisor problem is the number-theoretic analogue of multivariate polynomial reconstruction, and we develop a corresponding lattice-based algorithm for the latter problem. In particular, it specializes to a lattice-based list decoding algorithm for Parvaresh-Vardy and Guruswami-Rudra codes, which are multivariate extensions of Reed-Solomon codes. This yields a new proof of the list decoding radii for these codes.

algorithms
number-theory
rather-interesting
to-write-about
to-understand
nudge-targets
consider:rediscovery
approximation
matrices
Our results fit into a broader context of analogies between cryptanalysis and coding theory. The multivariate approximate common divisor problem is the number-theoretic analogue of multivariate polynomial reconstruction, and we develop a corresponding lattice-based algorithm for the latter problem. In particular, it specializes to a lattice-based list decoding algorithm for Parvaresh-Vardy and Guruswami-Rudra codes, which are multivariate extensions of Reed-Solomon codes. This yields a new proof of the list decoding radii for these codes.

september 2017 by Vaguery

[1512.00906] Butcher series: A story of rooted trees and numerical methods for evolution equations

september 2017 by Vaguery

Butcher series appear when Runge-Kutta methods for ordinary differential equations are expanded in power series of the step size parameter. Each term in a Butcher series consists of a weighted elementary differential, and the set of all such differentials is isomorphic to the set of rooted trees, as noted by Cayley in the mid 19th century. A century later Butcher discovered that rooted trees can also be used to obtain the order conditions of Runge-Kutta methods, and he found a natural group structure, today known as the Butcher group. It is now known that many numerical methods also can be expanded in Butcher series; these are called B-series methods. A long-standing problem has been to characterize, in terms of qualitative features, all B-series methods. Here we tell the story of Butcher series, stretching from the early work of Cayley, to modern developments and connections to abstract algebra, and finally to the resolution of the characterization problem. This resolution introduces geometric tools and perspectives to an area traditionally explored using analysis and combinatorics.

numerical-methods
differential-equations
symbolic-methods
algebra
mathematics
approximation
rather-interesting
to-write-about
consider:rediscovery
september 2017 by Vaguery

[1707.06300] Untangling the hairball: fitness based asymptotic reduction of biological networks

august 2017 by Vaguery

Complex mathematical models of interaction networks are routinely used for prediction in systems biology. However, it is difficult to reconcile network complexities with a formal understanding of their behavior. Here, we propose a simple procedure (called φ¯) to reduce biological models to functional submodules, using statistical mechanics of complex systems combined with a fitness-based approach inspired by in silico evolution. φ¯ works by putting parameters or combination of parameters to some asymptotic limit, while keeping (or slightly improving) the model performance, and requires parameter symmetry breaking for more complex models. We illustrate φ¯ on biochemical adaptation and on different models of immune recognition by T cells. An intractable model of immune recognition with close to a hundred individual transition rates is reduced to a simple two-parameter model. φ¯ extracts three different mechanisms for early immune recognition, and automatically discovers similar functional modules in different models of the same process, allowing for model classification and comparison. Our procedure can be applied to biological networks based on rate equations using a fitness function that quantifies phenotypic performance.

systems-biology
approximation
simplification
network-theory
representation
rather-interesting
algorithms
theoretical-biology
philosophy-of-science
to-write-about
august 2017 by Vaguery

How to Build an Impossible Polyhedron

june 2017 by Vaguery

Using stiff paper and transparent tape, Craig Kaplan assembles a beautiful roundish shape that looks like a Buckminster Fuller creation or a fancy new kind of soccer ball. It consists of four regular dodecagons (12-sided polygons with all angles and sides the same) and 12 decagons (10-sided), with 28 little gaps in the shape of equilateral triangles. There’s just one problem. This figure should be impossible. That set of polygons won’t meet at the vertices. The shape can’t close up.

via:my
markup
mathematics
approximation
rather-interesting
number-theory
open-questions
to-write-about
theory-and-practice-sitting-in-a-tree
philosophy-of-science
june 2017 by Vaguery

[1703.01350] Approximate Convex Hulls

may 2017 by Vaguery

We investigate the PPI algorithm as a means for computing ap- proximate convex hull. We explain how the algorithm computes the curvature of points and prove consistency and convergence. We also extend the algorithm to compute approximate convex hulls described in terms of hyperplanes.

computational-geometry
approximation
algorithms
rather-interesting
to-write-about
may 2017 by Vaguery

[1610.01983] Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks?

may 2017 by Vaguery

Deep learning has rapidly transformed the state of the art algorithms used to address a variety of problems in computer vision and robotics. These breakthroughs have relied upon massive amounts of human annotated training data. This time consuming process has begun impeding the progress of these deep learning efforts. This paper describes a method to incorporate photo-realistic computer images from a simulation engine to rapidly generate annotated data that can be used for the training of machine learning algorithms. We demonstrate that a state of the art architecture, which is trained only using these synthetic annotations, performs better than the identical architecture trained on human annotated real-world data, when tested on the KITTI data set for vehicle detection. By training machine learning algorithms on a rich virtual world, real objects in real scenes can be learned and classified using synthetic data. This approach offers the possibility of accelerating deep learning's application to sensor-based classification problems like those that appear in self-driving cars. The source code and data to train and validate the networks described in this paper are made available for researchers.

simulation
engineering-design
ontology
approximation
training
to-write-about
consider:stress-testing
may 2017 by Vaguery

[1608.08844] Snapping Graph Drawings to the Grid Optimally

may 2017 by Vaguery

In geographic information systems and in the production of digital maps for small devices with restricted computational resources one often wants to round coordinates to a rougher grid. This removes unnecessary detail and reduces space consumption as well as computation time. This process is called snapping to the grid and has been investigated thoroughly from a computational-geometry perspective. In this paper we investigate the same problem for given drawings of planar graphs under the restriction that their combinatorial embedding must be kept and edges are drawn straight-line. We show that the problem is NP-hard for several objectives and provide an integer linear programming formulation. Given a plane graph G and a positive integer w, our ILP can also be used to draw G straight-line on a grid of width w and minimum height (if possible).

graph-layout
computational-geometry
algorithms
rather-interesting
constraint-satisfaction
nudge-targets
consider:looking-to-see
approximation
performance-measure
may 2017 by Vaguery

[1603.05691] Do Deep Convolutional Nets Really Need to be Deep and Convolutional?

may 2017 by Vaguery

Yes, they do. This paper provides the first empirical demonstration that deep convolutional models really need to be both deep and convolutional, even when trained with methods such as distillation that allow small or shallow models of high accuracy to be trained. Although previous research showed that shallow feed-forward nets sometimes can learn the complex functions previously learned by deep nets while using the same number of parameters as the deep models they mimic, in this paper we demonstrate that the same methods cannot be used to train accurate models on CIFAR-10 unless the student models contain multiple layers of convolution. Although the student models do not have to be as deep as the teacher model they mimic, the students need multiple convolutional layers to learn functions of comparable accuracy as the deep convolutional teacher.

good-question
neural-networks
deep-learning
rather-interesting
approximation
heuristics
nudge-targets
consider:re-approximation
may 2017 by Vaguery

[1011.4245] When the optimal is not the best: parameter estimation in complex biological models

may 2017 by Vaguery

Background: The vast computational resources that became available during the past decade enabled the development and simulation of increasingly complex mathematical models of cancer growth. These models typically involve many free parameters whose determination is a substantial obstacle to model development. Direct measurement of biochemical parameters in vivo is often difficult and sometimes impracticable, while fitting them under data-poor conditions may result in biologically implausible values.

Results: We discuss different methodological approaches to estimate parameters in complex biological models. We make use of the high computational power of the Blue Gene technology to perform an extensive study of the parameter space in a model of avascular tumor growth. We explicitly show that the landscape of the cost function used to optimize the model to the data has a very rugged surface in parameter space. This cost function has many local minima with unrealistic solutions, including the global minimum corresponding to the best fit.

Conclusions: The case studied in this paper shows one example in which model parameters that optimally fit the data are not necessarily the best ones from a biological point of view. To avoid force-fitting a model to a dataset, we propose that the best model parameters should be found by choosing, among suboptimal parameters, those that match criteria other than the ones used to fit the model. We also conclude that the model, data and optimization approach form a new complex system, and point to the need of a theory that addresses this problem more generally.

optimization
biophysics
noise
stochastic-resonance
modeling
rather-interesting
to-write-about
approximation
performance-measure
Results: We discuss different methodological approaches to estimate parameters in complex biological models. We make use of the high computational power of the Blue Gene technology to perform an extensive study of the parameter space in a model of avascular tumor growth. We explicitly show that the landscape of the cost function used to optimize the model to the data has a very rugged surface in parameter space. This cost function has many local minima with unrealistic solutions, including the global minimum corresponding to the best fit.

Conclusions: The case studied in this paper shows one example in which model parameters that optimally fit the data are not necessarily the best ones from a biological point of view. To avoid force-fitting a model to a dataset, we propose that the best model parameters should be found by choosing, among suboptimal parameters, those that match criteria other than the ones used to fit the model. We also conclude that the model, data and optimization approach form a new complex system, and point to the need of a theory that addresses this problem more generally.

may 2017 by Vaguery

[1507.07970v2] Dividing the circle

may 2017 by Vaguery

There are known constructions for some regular polygons, usually inscribed in a circle, but not for all polygons - the Gauss-Wantzel Theorem states precisely which ones can be constructed.

The constructions differ greatly from one polygon to the other. There are, however, general processes for determining the side of the n-gon (approximately, but sometimes with great precision), which we describe in this paper. We present a joint mathematical analysis of the so-called Bion and Tempier approximation methods, comparing the errors and trying to explain why these constructions would work at all.

compass-and-straightedge
plane-geometry
construction
approximation
nudge-targets
consider:rediscovery
The constructions differ greatly from one polygon to the other. There are, however, general processes for determining the side of the n-gon (approximately, but sometimes with great precision), which we describe in this paper. We present a joint mathematical analysis of the so-called Bion and Tempier approximation methods, comparing the errors and trying to explain why these constructions would work at all.

may 2017 by Vaguery

[1704.00708] No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis

may 2017 by Vaguery

In this paper we develop a new framework that captures the common landscape underlying the common non-convex low-rank matrix problems including matrix sensing, matrix completion and robust PCA. In particular, we show for all above problems (including asymmetric cases): 1) all local minima are also globally optimal; 2) no high-order saddle points exists. These results explain why simple algorithms such as stochastic gradient descent have global converge, and efficiently optimize these non-convex objective functions in practice. Our framework connects and simplifies the existing analyses on optimization landscapes for matrix sensing and symmetric matrix completion. The framework naturally leads to new results for asymmetric matrix completion and robust PCA.

compressed-sensing
matrices
optimization
approximation
rather-interesting
machine-learning
nudge-targets
consider:representation
may 2017 by Vaguery

[1607.05527] An Approximation Algorithm for the Art Gallery Problem

april 2017 by Vaguery

Given a simple polygon on n vertices, two points x,y in are said to be visible to each other if the line segment between x and y is contained in . The Point Guard Art Gallery problem asks for a minimum set S such that every point in is visible from a point in S. The set S is referred to as guards. Assuming integer coordinates and a specific general position assumption, we present the first O(logOPT)-approximation algorithm for the point guard problem for simple polygons. This algorithm combines ideas of a paper of Efrat and Har-Peled [Inf. Process. Lett. 2006] and Deshpande et. al. [WADS 2007]. We also point out a mistake in the latter.

computational-geometry
algorithms
approximation
nudge-targets
april 2017 by Vaguery

[1307.0587] Self-Assembly on a Cylinder: A Model System for Understanding the Constraint of Commensurability

april 2017 by Vaguery

A crystal lattice, when confined to the surface of a cylinder, must have a periodic structure that is commensurate with the cylinder circumference. This constraint can frustrate the system, leading to oblique crystal lattices or to structures with a chiral seam known as a "line slip" phase, neither of which are stable for isotropic particles in equilibrium on flat surfaces. In this study, we use molecular dynamics simulations to find the steady-state structure of spherical particles with short-range repulsion and long-range attraction far below the melting temperature. We vary the range of attraction using the Lennard-Jones and Morse potentials and find that a shorter-range attraction favors the line-slip. We develop a simple model based only on geometry and bond energy to predict when the crystal or line-slip phases should appear, and find reasonable agreement with the simulations. The simplicity of this model allows us to understand the influence of the commensurability constraint, an understanding that might be extended into the more general problem of self-assembling particles in strongly confined spaces.

packing
tiling
approximation
looking-to-see
out-of-the-box
physics!
simulation
frustrated-systems
self-organization
april 2017 by Vaguery

[1702.08328] A Universal Ordinary Differential Equation

april 2017 by Vaguery

An astonishing fact was established by Lee A. Rubel in 81: there exists a fixed non-trivial fourth-order polynomial differential algebraic equation (DAE) such that for any continuous function ϕ on the reals, and for any positive continuous function ϵ(t), it has a ∞ solution with |y(t)−ϕ(t)|<ϵ(t) for all t. Rubel provided an explicit example of such a polynomial DAE. More examples have later been proposed by other authors.

However, while these results may seem very surprising, their proofs are quite frustrating for a computability theorist. First, the constructed DAE have no unique solutions for a given initial data. This is very different from usual notions of universality since there is no unambiguous notion of evolution for a given initial data. Second, the proofs usually rely on solutions that are piecewise defined and sometimes non-constructive. Third, the proofs of these results can be interpreted more as the fact that polynomial algebraic differential equations is a too loose a model compared to classical ordinary differential equations. In particular, one may challenge whether the result is really a universality result.

The question whether one can require the solution that approximates ϕ to be the unique solution for a given initial data is a well known open problem [Rub81] (page 2), [boshernitzan1986universal] (Conjecture 6.2). In this article, we solve it and show that Rubel's statement holds for polynomial ordinary differential equations (ODEs), and since polynomial ODEs have a unique solution given an initial data, this positively answers Rubel's open problem. More precisely, we show that there exists a \textbf{fixed} polynomial ODE such that for any ϕ and ϵ there exists some initial condition that yields a solution that is ϵ-close to ϕ at all times.

diffyQ
nonlinear-dynamics
approximation
representation
rather-interesting
analytical-expressions
proof
However, while these results may seem very surprising, their proofs are quite frustrating for a computability theorist. First, the constructed DAE have no unique solutions for a given initial data. This is very different from usual notions of universality since there is no unambiguous notion of evolution for a given initial data. Second, the proofs usually rely on solutions that are piecewise defined and sometimes non-constructive. Third, the proofs of these results can be interpreted more as the fact that polynomial algebraic differential equations is a too loose a model compared to classical ordinary differential equations. In particular, one may challenge whether the result is really a universality result.

The question whether one can require the solution that approximates ϕ to be the unique solution for a given initial data is a well known open problem [Rub81] (page 2), [boshernitzan1986universal] (Conjecture 6.2). In this article, we solve it and show that Rubel's statement holds for polynomial ordinary differential equations (ODEs), and since polynomial ODEs have a unique solution given an initial data, this positively answers Rubel's open problem. More precisely, we show that there exists a \textbf{fixed} polynomial ODE such that for any ϕ and ϵ there exists some initial condition that yields a solution that is ϵ-close to ϕ at all times.

april 2017 by Vaguery

[1605.06857] Finding the Year's Share in Day-of-Week Calculations

march 2017 by Vaguery

The dominant part in the mental calculation of the day of the week for any given date is to determine the year share, that is, the contribution of the two-digit year part of the date. This paper describes a number of year share computation methods, some well-known and some new. The "Parity Minus 3" method, in particular, is a new alternative to the popular "Odd+11" method. The paper categorizes the methods of year share computation, and presents simpler proofs of their correctness than usually provided.

Keywords and phrases: day of the week, calendar algorithms, doomsday method, first Sunday algorithm, mental arithmetic, year share

mathematics
mnemonics
approximation
rather-interesting
mathematical-recreations
nudge-targets
consider:looking-to-see
Keywords and phrases: day of the week, calendar algorithms, doomsday method, first Sunday algorithm, mental arithmetic, year share

march 2017 by Vaguery

[1306.5527] Computing the Fr'echet Distance with a Retractable Leash

march 2017 by Vaguery

All known algorithms for the Fr\'echet distance between curves proceed in two steps: first, they construct an efficient oracle for the decision version; second, they use this oracle to find the optimum from a finite set of critical values. We present a novel approach that avoids the detour through the decision version. This gives the first quadratic time algorithm for the Fr\'echet distance between polygonal curves in Rd under polyhedral distance functions (e.g., L1 and L∞). We also get a (1+ε)-approximation of the Fr\'echet distance under the Euclidean metric, in quadratic time for any fixed ε>0. For the exact Euclidean case, our framework currently yields an algorithm with running time O(n2log2n). However, we conjecture that it may eventually lead to a faster exact algorithm.

metrics
algorithms
numerical-methods
rather-interesting
nudge-targets
approximation
consider:looking-to-see
consider:representation
to-write-about
march 2017 by Vaguery

[1606.06660] Mapping polygons to the grid with small Hausdorff and Fr'echet distance

march 2017 by Vaguery

We show how to represent a simple polygon P by a grid (pixel-based) polygon Q that is simple and whose Hausdorff or Fr\'echet distance to P is small. For any simple polygon P, a grid polygon exists with constant Hausdorff distance between their boundaries and their interiors. Moreover, we show that with a realistic input assumption we can also realize constant Fr\'echet distance between the boundaries. We present algorithms accompanying these constructions, heuristics to improve their output while keeping the distance bounds, and experiments to assess the output.

rather-interesting
computational-geometry
approximation
metrics
performance-measure
nudge-targets
consider:looking-to-see
march 2017 by Vaguery

[1603.06252] Grouping Time-varying Data for Interactive Exploration

march 2017 by Vaguery

We present algorithms and data structures that support the interactive analysis of the grouping structure of one-, two-, or higher-dimensional time-varying data while varying all defining parameters. Grouping structures characterise important patterns in the temporal evaluation of sets of time-varying data. We follow Buchin et al. [JoCG 2015] who define groups using three parameters: group-size, group-duration, and inter-entity distance. We give upper and lower bounds on the number of maximal groups over all parameter values, and show how to compute them efficiently. Furthermore, we describe data structures that can report changes in the set of maximal groups in an output-sensitive manner. Our results hold in ℝd for fixed d.

clustering
feature-construction
rather-interesting
to-understand
consider:performance-space-analysis
compare-to-Pareto-GP-features
visualization
approximation
to-write-about
algorithms
computational-geometry
march 2017 by Vaguery

[1506.00990] Unsupervised Learning on Neural Network Outputs: with Application in Zero-shot Learning

march 2017 by Vaguery

The outputs of a trained neural network contain much richer information than just an one-hot classifier. For example, a neural network might give an image of a dog the probability of one in a million of being a cat but it is still much larger than the probability of being a car. To reveal the hidden structure in them, we apply two unsupervised learning algorithms, PCA and ICA, to the outputs of a deep Convolutional Neural Network trained on the ImageNet of 1000 classes. The PCA/ICA embedding of the object classes reveals their visual similarity and the PCA/ICA components can be interpreted as common visual features shared by similar object classes. For an application, we proposed a new zero-shot learning method, in which the visual features learned by PCA/ICA are employed. Our zero-shot learning method achieves the state-of-the-art results on the ImageNet of over 20000 classes.

machine-learning
approximation
neural-networks
classification
rather-interesting
to-write-about
like-the-Soft-Push-scheme
consider:looking-to-see
consider:feature-discovery
march 2017 by Vaguery

[1703.01054] When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors

march 2017 by Vaguery

Finding similar user pairs is a fundamental task in social networks, with numerous applications in ranking and personalization tasks such as link prediction and tie strength detection. A common manifestation of user similarity is based upon network structure: each user is represented by a vector that represents the user's network connections, where pairwise cosine similarity among these vectors defines user similarity. The predominant task for user similarity applications is to discover all similar pairs that have a pairwise cosine similarity value larger than a given threshold τ. In contrast to previous work where τ is assumed to be quite close to 1, we focus on recommendation applications where τ is small, but still meaningful. The all pairs cosine similarity problem is computationally challenging on networks with billions of edges, and especially so for settings with small τ. To the best of our knowledge, there is no practical solution for computing all user pairs with, say τ=0.2 on large social networks, even using the power of distributed algorithms.

Our work directly addresses this challenge by introducing a new algorithm --- WHIMP --- that solves this problem efficiently in the MapReduce model. The key insight in WHIMP is to combine the "wedge-sampling" approach of Cohen-Lewis for approximate matrix multiplication with the SimHash random projection techniques of Charikar. We provide a theoretical analysis of WHIMP, proving that it has near optimal communication costs while maintaining computation cost comparable with the state of the art. We also empirically demonstrate WHIMP's scalability by computing all highly similar pairs on four massive data sets, and show that it accurately finds high similarity pairs. In particular, we note that WHIMP successfully processes the entire Twitter network, which has tens of billions of edges.

algorithms
numerical-methods
linear-algebra
metrics
approximation
nudge-targets
consider:representation
Our work directly addresses this challenge by introducing a new algorithm --- WHIMP --- that solves this problem efficiently in the MapReduce model. The key insight in WHIMP is to combine the "wedge-sampling" approach of Cohen-Lewis for approximate matrix multiplication with the SimHash random projection techniques of Charikar. We provide a theoretical analysis of WHIMP, proving that it has near optimal communication costs while maintaining computation cost comparable with the state of the art. We also empirically demonstrate WHIMP's scalability by computing all highly similar pairs on four massive data sets, and show that it accurately finds high similarity pairs. In particular, we note that WHIMP successfully processes the entire Twitter network, which has tens of billions of edges.

march 2017 by Vaguery

[1701.07492] Path abstraction

march 2017 by Vaguery

Given the set of paths through a digraph, the result of uniformly deleting some vertices and identifying others along each path is coherent in such a way as to yield the set of paths through another digraph, called a \emph{path abstraction} of the original digraph. The construction of path abstractions is detailed and relevant basic results are established; generalizations are also discussed. Connections with random digraphs are also illustrated.

graph-theory
approximation
rather-odd
can't-wait-to-understand-this
wtf?
march 2017 by Vaguery

[1410.0642] A Note on Archetypal Analysis and the Approximation of Convex Hulls

february 2017 by Vaguery

We briefly review the basic ideas behind archetypal analysis for matrix factorization and discuss its behavior in approximating the convex hull of a data sample. We then ask how good such approximations can be and consider different cases. Understanding archetypal analysis as the problem of computing a convexity constrained low-rank approximation of the identity matrix provides estimates for archetypal analysis and the SiVM heuristic.

archetypal-analysis
approximation
computational-geometry
rather-interesting
algorithms
to-write-about
february 2017 by Vaguery

[1612.08515] Compositional Synthesis of Finite State Abstractions

february 2017 by Vaguery

Controller synthesis techniques for continuous systems with respect to temporal logic specifications typically use a finite-state symbolic abstraction of the system model. Constructing this abstraction for the entire system is computationally expensive, and does not exploit natural decompositions of many systems into interacting components. We describe a methodology for compositional symbolic abstraction to help scale controller synthesis for temporal logic to larger systems.

We introduce a new relation, called (approximate) disturbance bisimulation, as the basis for compositional symbolic abstractions. Disturbance bisimulation strengthens the standard approximate alternating bisimulation relation used in control, and extends naturally to systems which are composed of sub-components possibly connected in feedback; disturbance bisimulation handles the feedback signals as disturbances. After proving this composability of disturbance bisimulation for metric systems, we show how one can construct finite-state abstractions compositionally for each component, so that the abstractions are simultaneously disturbance bisimilar to their continuous counterparts. Combining these two results, we can compositionally abstract a network system in a modular way while ensuring that the final composed abstraction is distrubance bisimilar to the original system.

We discuss how we get a compositional controller synthesis methodology for networks of such systems against local temporal specifications as a by-product of our construction.

control-theory
automata
approximation
machine-learning
rather-interesting
to-understand
nudge-targets
consider:representation
We introduce a new relation, called (approximate) disturbance bisimulation, as the basis for compositional symbolic abstractions. Disturbance bisimulation strengthens the standard approximate alternating bisimulation relation used in control, and extends naturally to systems which are composed of sub-components possibly connected in feedback; disturbance bisimulation handles the feedback signals as disturbances. After proving this composability of disturbance bisimulation for metric systems, we show how one can construct finite-state abstractions compositionally for each component, so that the abstractions are simultaneously disturbance bisimilar to their continuous counterparts. Combining these two results, we can compositionally abstract a network system in a modular way while ensuring that the final composed abstraction is distrubance bisimilar to the original system.

We discuss how we get a compositional controller synthesis methodology for networks of such systems against local temporal specifications as a by-product of our construction.

february 2017 by Vaguery

[1702.04917] Compressed sensing in Hilbert spaces

february 2017 by Vaguery

In many linear inverse problems, we want to estimate an unknown vector belonging to a high-dimensional (or infinite-dimensional) space from few linear measurements. To overcome the ill-posed nature of such problems, we use a low-dimension assumption on the unknown vector: it belongs to a low-dimensional model set. The question of whether it is possible to recover such an unknown vector from few measurements then arises. If the answer is yes, it is also important to be able to describe a way to perform such a recovery. We describe a general framework where appropriately chosen random measurements guarantee that recovery is possible. We further describe a way to study the performance of recovery methods that consist in the minimization of a regularization function under a data-fit constraint.

approximation
compressed-sensing
inference
modeling
algorithms
february 2017 by Vaguery

[1701.03230] Surface Reconstruction with Data-driven Exemplar Priors

february 2017 by Vaguery

In this paper, we propose a framework to reconstruct 3D models from raw scanned points by learning the prior knowledge of a specific class of objects. Unlike previous work that heuristically specifies particular regularities and defines parametric models, our shape priors are learned directly from existing 3D models under a framework based on affinity propagation. Given a database of 3D models within the same class of objects, we build a comprehensive library of 3D local shape priors. We then formulate the problem to select as-few-as-possible priors from the library, referred to as exemplar priors. These priors are sufficient to represent the 3D shapes of the whole class of objects from where they are generated. By manipulating these priors, we are able to reconstruct geometrically faithful models with the same class of objects from raw point clouds. Our framework can be easily generalized to reconstruct various categories of 3D objects that have more geometrically or topologically complex structures. Comprehensive experiments exhibit the power of our exemplar priors for gracefully solving several problems in 3D shape reconstruction such as preserving sharp features, recovering fine details and so on.

computational-geometry
inference
statistics
algorithms
rather-interesting
approximation
modeling
nudge-targets
consider:performance-measures
consider:parsimony
february 2017 by Vaguery

[1702.04748] An Improved Dictatorship Test with Perfect Completeness

february 2017 by Vaguery

A Boolean function f:{0,1}n→{0,1} is called a dictator if it depends on exactly one variable i.e f(x1,x2,…,xn)=xi for some i∈[n]. In this work, we study a k-query dictatorship test. Dictatorship tests are central in proving many hardness results for constraint satisfaction problems.

The dictatorship test is said to have {\em perfect completeness} if it accepts any dictator function. The {\em soundness} of a test is the maximum probability with which it accepts any function far from a dictator. Our main result is a k-query dictatorship test with perfect completeness and soundness 2k+12k, where k is of the form 2t−1 for any integer t>2. This improves upon the result of \cite{TY15} which gave a dictatorship test with soundness 2k+32k.

constraint-satisfaction
computational-complexity
rather-interesting
approximation
heuristics
nudge-targets
consider:classification
The dictatorship test is said to have {\em perfect completeness} if it accepts any dictator function. The {\em soundness} of a test is the maximum probability with which it accepts any function far from a dictator. Our main result is a k-query dictatorship test with perfect completeness and soundness 2k+12k, where k is of the form 2t−1 for any integer t>2. This improves upon the result of \cite{TY15} which gave a dictatorship test with soundness 2k+32k.

february 2017 by Vaguery

[1701.01337] Abilities and Limitations of Spectral Graph Bisection

february 2017 by Vaguery

Spectral based heuristics belong to well-known commonly used methods for finding a minimum-size bisection in a graph. The heuristics are usually easy to implement and they work well for several practice-relevant classes of graphs. However, only a few research efforts are focused on providing rigorous analysis of such heuristics and often they lack of proven optimality or approximation quality. This paper focuses on the spectral heuristic proposed by Boppana almost three decades ago, which still belongs to one of the most important bisection methods.

It is well known that Boppana's algorithm finds and certifies an optimal bisection with high probability in the random planted bisection model -- the standard model which captures many real-world instances. In this model the vertex set is partitioned randomly into two equal sized sets, and then each edge inside the same part of the partition is chosen with probability p and each edge crossing the partition is chosen with probability q, with p≥q. In our paper we investigate the problem if Boppana's algorithm works well in the semirandom model introduced by Feige and Kilian. The model generates initially an instance by random selection within the planted bisection model, followed by adversarial decisions. Feige and Kilian posed the question if Boppana's algorithm works well in the semirandom model and it has remained open so far. In our paper we answer the question affirmatively. We show also that the algorithm achieves similar performance on graph models which generalize the semirandom model. On the other hand we prove some limitations: we show that if the density difference p−q≤o(p⋅lnn‾‾‾‾‾‾√/n‾√) then the algorithm fails with high probability in the planted bisection model. This bound is sharp, since under assumption p−q≥Ω(p⋅lnn‾‾‾‾‾‾√/n‾√) Boppana's algorithm works well in the model.

graph-theory
algorithms
bisection
horse-races
approximation
nudge-targets
consider:representation
It is well known that Boppana's algorithm finds and certifies an optimal bisection with high probability in the random planted bisection model -- the standard model which captures many real-world instances. In this model the vertex set is partitioned randomly into two equal sized sets, and then each edge inside the same part of the partition is chosen with probability p and each edge crossing the partition is chosen with probability q, with p≥q. In our paper we investigate the problem if Boppana's algorithm works well in the semirandom model introduced by Feige and Kilian. The model generates initially an instance by random selection within the planted bisection model, followed by adversarial decisions. Feige and Kilian posed the question if Boppana's algorithm works well in the semirandom model and it has remained open so far. In our paper we answer the question affirmatively. We show also that the algorithm achieves similar performance on graph models which generalize the semirandom model. On the other hand we prove some limitations: we show that if the density difference p−q≤o(p⋅lnn‾‾‾‾‾‾√/n‾√) then the algorithm fails with high probability in the planted bisection model. This bound is sharp, since under assumption p−q≥Ω(p⋅lnn‾‾‾‾‾‾√/n‾√) Boppana's algorithm works well in the model.

february 2017 by Vaguery

[1702.02891] Sparse Approximation by Semidefinite Programming

february 2017 by Vaguery

The problem of sparse approximation and the closely related compressed sensing have received tremendous attention in the past decade. Primarily studied from the viewpoint of applied harmonic analysis and signal processing, there have been two dominant algorithmic approaches to this problem: Greedy methods called the matching pursuit (MP) and the linear programming based approaches called the basis pursuit (BP). The aim of the current paper is to bring a fresh perspective to sparse approximation by treating it as a combinatorial optimization problem and providing an algorithm based on the powerful optimization technique semidefinite programming (SDP). In particular, we show that there is a randomized algorithm based on a semidefinite relaxation of the problem with performance guarantees depending on the coherence and the restricted isometry constant of the dictionary used. We then show a derandomization of the algorithm based on the method of conditional probabilities.

approximation
compressed-sensing
representation
mathematical-programming
numerical-methods
performance-measure
nudge-targets
consider:looking-to-see
consider:feature-discovery
february 2017 by Vaguery

[1604.04660] Why Artificial Intelligence Needs a Task Theory --- And What It Might Look Like

february 2017 by Vaguery

The concept of "task" is at the core of artificial intelligence (AI): Tasks are used for training and evaluating AI systems, which are built in order to perform and automatize tasks we deem useful. In other fields of engineering theoretical foundations allow thorough evaluation of designs by methodical manipulation of well understood parameters with a known role and importance; this allows an aeronautics engineer, for instance, to systematically assess the effects of wind speed on an airplane's performance and stability. No framework exists in AI that allows this kind of methodical manipulation: Performance results on the few tasks in current use (cf. board games, question-answering) cannot be easily compared, however similar or different. The issue is even more acute with respect to artificial *general* intelligence systems, which must handle unanticipated tasks whose specifics cannot be known beforehand. A *task theory* would enable addressing tasks at the *class* level, bypassing their specifics, providing the appropriate formalization and classification of tasks, environments, and their parameters, resulting in more rigorous ways of measuring, comparing, and evaluating intelligent behavior. Even modest improvements in this direction would surpass the current ad-hoc nature of machine learning and AI evaluation. Here we discuss the main elements of the argument for a task theory and present an outline of what it might look like for physical tasks.

artificial-intelligence
philosophy-of-engineering
rather-interesting
representation
approximation
aggregation
nudge-targets
consider:looking-to-see
consider:performance-space-analysis
february 2017 by Vaguery

Your favorite color makes learning more adaptable and precise | bioRxiv

february 2017 by Vaguery

Learning from reward feedback is essential for survival but can become extremely challenging when choice options have multiple features and feature values (curse of dimensionality). Here, we propose a general framework for learning reward values in dynamic multi-dimensional environments via encoding and updating the average value of individual features. We predicted that this feature-based learning occurs not just because it can reduce dimensionality, but more importantly because it can increase adaptability without compromising precision. We experimentally tested this novel prediction and found that in dynamic environments, human subjects adopted feature-based learning even when this approach does not reduce dimensionality. Even in static low-dimensional environment, subjects initially tended to adopt feature-based learning and switched to learning individual option values only when feature values could not accurately predict all objects values. Moreover, behaviors of two alternative network models demonstrated that hierarchical decision-making and learning could account for our experimental results and thus provides a plausible mechanism for model adoption during learning in dynamic environments. Our results constrain neural mechanisms underlying learning in dynamic multi-dimensional environments, and highlight the importance of neurons encoding the value of individual features in this learning.

machine-learning
curse-of-dimensionality
feature-construction
rather-interesting
approximation
nudge-targets
consider:feature-discovery
february 2017 by Vaguery

[1701.01207] A Matrix Factorization Approach for Learning Semidefinite-Representable Regularizers

january 2017 by Vaguery

Regularization techniques are widely employed in optimization-based approaches for solving ill-posed inverse problems in data analysis and scientific computing. These methods are based on augmenting the objective with a penalty function, which is specified based on prior domain-specific expertise to induce a desired structure in the solution. We consider the problem of learning suitable regularization functions from data in settings in which precise domain knowledge is not directly available. Previous work under the title of `dictionary learning' or `sparse coding' may be viewed as learning a regularization function that can be computed via linear programming. We describe generalizations of these methods to learn regularizers that can be computed and optimized via semidefinite programming. Our framework for learning such semidefinite regularizers is based on obtaining structured factorizations of data matrices, and our algorithmic approach for computing these factorizations combines recent techniques for rank minimization problems along with an operator analog of Sinkhorn scaling. Under suitable conditions on the input data, our algorithm provides a locally linearly convergent method for identifying the correct regularizer that promotes the type of structure contained in the data. Our analysis is based on the stability properties of Operator Sinkhorn scaling and their relation to geometric aspects of determinantal varieties (in particular tangent spaces with respect to these varieties). The regularizers obtained using our framework can be employed effectively in semidefinite programming relaxations for solving inverse problems.

matrices
inverse-problems
approximation
algorithms
nudge-targets
consider:looking-to-see
to-write-about
consider:representation
january 2017 by Vaguery

[1612.02483] High Dimensional Consistent Digital Segments

january 2017 by Vaguery

We consider the problem of digitalizing Euclidean line segments from ℝd to ℤd. Christ {\em et al.} (DCG, 2012) showed how to construct a set of {\em consistent digital segment} (CDS) for d=2: a collection of segments connecting any two points in ℤ2 that satisfies the natural extension of the Euclidean axioms to ℤd. In this paper we study the construction of CDSs in higher dimensions.

We show that any total order can be used to create a set of {\em consistent digital rays} CDR in ℤd (a set of rays emanating from a fixed point p that satisfies the extension of the Euclidean axioms). We fully characterize for which total orders the construction holds and study their Hausdorff distance, which in particular positively answers the question posed by Christ {\em et al.}.

approximation
computational-geometry
performance-measure
rather-interesting
mathematics
consistency
models-and-modes
constructive-geometry
nudge-targets
consider:representation
consider:looking-to-see
We show that any total order can be used to create a set of {\em consistent digital rays} CDR in ℤd (a set of rays emanating from a fixed point p that satisfies the extension of the Euclidean axioms). We fully characterize for which total orders the construction holds and study their Hausdorff distance, which in particular positively answers the question posed by Christ {\em et al.}.

january 2017 by Vaguery

[1603.08812] Reconstruction of evolved dynamic networks from degree correlations

january 2017 by Vaguery

We study the importance of local structural properties in networks which have been evolved for a power-law scaling in their Laplacian spectrum. To this end, the degree distribution, two-point degree correlations, and degree-dependent clustering are extracted from the evolved networks and used to construct random networks with the prescribed distributions. In the analysis of these reconstructed networks it turns out that the degree distribution alone is not sufficient to generate the spectral scaling and the degree-dependent clustering has only an indirect influence. The two-point correlations are found to be the dominant characteristic for the power-law scaling over a broader eigenvalue range.

approximation
graph-theory
algorithms
statistics
nudge
consider:looking-to-see
feature-extraction
january 2017 by Vaguery

[1511.05118] Random sampling of bandlimited signals on graphs

january 2017 by Vaguery

We study the problem of sampling k-bandlimited signals on graphs. We propose two sampling strategies that consist in selecting a small subset of nodes at random. The first strategy is non-adaptive, i.e., independent of the graph structure, and its performance depends on a parameter called the graph coherence. On the contrary, the second strategy is adaptive but yields optimal results. Indeed, no more than O(k log(k)) measurements are sufficient to ensure an accurate and stable recovery of all k-bandlimited signals. This second strategy is based on a careful choice of the sampling distribution, which can be estimated quickly. Then, we propose a computationally efficient decoder to reconstruct k-bandlimited signals from their samples. We prove that it yields accurate reconstructions and that it is also stable to noise. Finally, we conduct several experiments to test these techniques.

sampling
graph-theory
network-theory
rather-interesting
approximation
compressed-sensing
nudge-targets
performance-measure
consider:looking-to-see
to-write-about
january 2017 by Vaguery

[1509.05009] On the Expressive Power of Deep Learning: A Tensor Analysis

january 2017 by Vaguery

It has long been conjectured that hypotheses spaces suitable for data that is compositional in nature, such as text or images, may be more efficiently represented with deep hierarchical networks than with shallow ones. Despite the vast empirical evidence supporting this belief, theoretical justifications to date are limited. In particular, they do not account for the locality, sharing and pooling constructs of convolutional networks, the most successful deep learning architecture to date. In this work we derive a deep network architecture based on arithmetic circuits that inherently employs locality, sharing and pooling. An equivalence between the networks and hierarchical tensor factorizations is established. We show that a shallow network corresponds to CP (rank-1) decomposition, whereas a deep network corresponds to Hierarchical Tucker decomposition. Using tools from measure theory and matrix algebra, we prove that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require exponential size in order to be realized (or even approximated) by a shallow network. Since log-space computation transforms our networks into SimNets, the result applies directly to a deep learning architecture demonstrating promising empirical performance. The construction and theory developed in this paper shed new light on various practices and ideas employed by the deep learning community.

neural-networks
deep-learning
approximation
rather-interesting
representation
feature-construction
january 2017 by Vaguery

[1604.01175] On the Combinatorial Complexity of Approximating Polytopes

january 2017 by Vaguery

Approximating convex bodies succinctly by convex polytopes is a fundamental problem in discrete geometry. A convex body K of diameter diam(K) is given in Euclidean d-dimensional space, where d is a constant. Given an error parameter ε>0, the objective is to determine a polytope of minimum combinatorial complexity whose Hausdorff distance from K is at most ε⋅diam(K). By combinatorial complexity we mean the total number of faces of all dimensions of the polytope. A well-known result by Dudley implies that O(1/ε(d−1)/2) facets suffice, and a dual result by Bronshteyn and Ivanov similarly bounds the number of vertices, but neither result bounds the total combinatorial complexity. We show that there exists an approximating polytope whose total combinatorial complexity is Õ (1/ε(d−1)/2), where Õ conceals a polylogarithmic factor in 1/ε. This is a significant improvement upon the best known bound, which is roughly O(1/εd−2).

Our result is based on a novel combination of both old and new ideas. First, we employ Macbeath regions, a classical structure from the theory of convexity. The construction of our approximating polytope employs a new stratified placement of these regions. Second, in order to analyze the combinatorial complexity of the approximating polytope, we present a tight analysis of a width-based variant of B\'{a}r\'{a}ny and Larman's economical cap covering. Finally, we use a deterministic adaptation of the witness-collector technique (developed recently by Devillers et al.) in the context of our stratified construction.

approximation
computational-geometry
computational-complexity
algorithms
rather-interesting
to-understand
representation
nudge-targets
consider:look
Our result is based on a novel combination of both old and new ideas. First, we employ Macbeath regions, a classical structure from the theory of convexity. The construction of our approximating polytope employs a new stratified placement of these regions. Second, in order to analyze the combinatorial complexity of the approximating polytope, we present a tight analysis of a width-based variant of B\'{a}r\'{a}ny and Larman's economical cap covering. Finally, we use a deterministic adaptation of the witness-collector technique (developed recently by Devillers et al.) in the context of our stratified construction.

january 2017 by Vaguery

[1610.08820] Bin Packing Problem: A Linear Constant-Space 3/2-Approximation Algorithm

december 2016 by Vaguery

Since the Bin Packing Problem (BPP) is one of the main NP-hard problems, a lot of approximation algorithms have been suggested for it. It has been proven that the best algorithm for BPP has the approximation ratio of 3/2 and the time order of O(n), unless P=NP. In the current paper, a linear 3/2-approximation algorithm is presented. The suggested algorithm not only has the best possible theoretical factors, approximation ratio, space order, and time order, but also outperforms the other approximation algorithms according to the experimental results, therefore, we are able to draw the conclusion that the algorithms is the best approximation algorithm which has been presented for the problem until now. Key words: Approximation Algorithm, Bin Packing Problem, Approximation Ratio, NP-hard.

bin-packing
operations-research
optimization
approximation
horse-races
rather-interesting
nudge-targets
consider:looking-to-see
consider:representation
consider:robustness
december 2016 by Vaguery

[1612.00193] Learning Potential Energy Landscapes using Graph Kernels

december 2016 by Vaguery

Recent machine learning methods make it possible to model potential energy of atomic configurations with chemical-level accuracy (as calculated from ab-initio calculations) and at speeds suitable for molecular dynamics simulation. Best performance is achieved when the known physical constraints are encoded in the machine learning models. For example, the atomic energy is invariant under global translations and rotations; it is also invariant to permutations of same-species atoms. Although simple to state, these symmetries are complicated to encode into machine learning algorithms. In this paper, we present a machine learning approach based on graph theory that naturally incorporates translation, rotation, and permutation symmetries. Specifically, we use a random walk graph kernel to measure the similarity of two adjacency matrices, each of which represents a local atomic environment. We show on a standard benchmark that our Graph Approximated Energy (GRAPE) method is competitive with state of the art kernel methods. Furthermore, the GRAPE framework is flexible and admits many possible extensions.

energy-landscapes
fitness-landscapes
machine-learning
approximation
kernel-methods
graph-kernels
to-understand
representation
nudge-targets
consider:representation
december 2016 by Vaguery

[1602.02262] Recovery guarantee of weighted low-rank approximation via alternating minimization

december 2016 by Vaguery

Many applications require recovering a ground truth low-rank matrix from noisy observations of the entries, which in practice is typically formulated as a weighted low-rank approximation problem and solved by non-convex optimization heuristics such as alternating minimization. In this paper, we provide provable recovery guarantee of weighted low-rank via a simple alternating minimization algorithm. In particular, for a natural class of matrices and weights and without any assumption on the noise, we bound the spectral norm of the difference between the recovered matrix and the ground truth, by the spectral norm of the weighted noise plus an additive error that decreases exponentially with the number of rounds of alternating minimization, from either initialization by SVD or, more importantly, random initialization. These provide the first theoretical results for weighted low-rank via alternating minimization with non-binary deterministic weights, significantly generalizing those for matrix completion, the special case with binary weights, since our assumptions are similar or weaker than those made in existing works. Furthermore, this is achieved by a very simple algorithm that improves the vanilla alternating minimization with a simple clipping step.

The key technical challenge is that under non-binary deterministic weights, na\"ive alternating steps will destroy the incoherence and spectral properties of the intermediate solutions, which are needed for making progress towards the ground truth. We show that the properties only need to hold in an average sense and can be achieved by the clipping step.

We further provide an alternating algorithm that uses a whitening step that keeps the properties via SDP and Rademacher rounding and thus requires weaker assumptions. This technique can potentially be applied in some other applications and is of independent interest.

matrices
approximation
dimension-reduction
compressed-sensing
algorithms
nudge-targets
consider:looking-to-see
consider:representation
The key technical challenge is that under non-binary deterministic weights, na\"ive alternating steps will destroy the incoherence and spectral properties of the intermediate solutions, which are needed for making progress towards the ground truth. We show that the properties only need to hold in an average sense and can be achieved by the clipping step.

We further provide an alternating algorithm that uses a whitening step that keeps the properties via SDP and Rademacher rounding and thus requires weaker assumptions. This technique can potentially be applied in some other applications and is of independent interest.

december 2016 by Vaguery

[1612.01605] Zipf's law, unbounded complexity and open-ended evolution

december 2016 by Vaguery

A major problem for evolutionary theory is understanding the so called {\em open-ended} nature of evolutionary change. Open-ended evolution (OEE) refers to the unbounded increase in complexity that seems to characterise evolution on multiple scales. This property seems to be a characteristic feature of biological and technological evolution and is strongly tied to the generative potential associated with combinatorics, which allows the system to grow and expand their available state spaces. Several theoretical and computational approaches have been developed to properly characterise OEE. Interestingly, many complex systems displaying OEE, from language to proteins, share a common statistical property: the presence of Zipf's law. Given and inventory of basic items required to build more complex structures Zipf's law tells us that most of these elements are rare whereas a few of them are extremely common. Using Algorithmic Information Theory, in this paper we provide a fundamental definition for open-endedness, which can be understood as {\em postulates}. Its statistical counterpart, based on standard Shannon Information theory, has the structure of a variational problem which is shown to lead to Zipf's law as the expected consequence of an evolutionary processes displaying OEE. We further explore the problem of information conservation through an OEE process and we conclude that statistical information (standard Shannon information) is not conserved, resulting into the paradoxical situation in which the increase of information content has the effect of erasing itself. We prove that this paradox is solved if we consider non-statistical forms of information. This last result implies that standard information theory may not be a suitable theoretical framework to explore the persistence and increase of the information content in OEE systems.

information-theory
open-ended-evolution
artificial-life
emergent-design
emergence
approximation
modeling
to-write-about
december 2016 by Vaguery

[1606.09630] Universality of Resilience Patterns in Generalized Lotka Volterra Dynamics and Beyond

december 2016 by Vaguery

Recently, a theoretical framework aimed at separating the roles of dynamics and topology in multi-dimensional systems has been developed (Gao et al, \textit{Nature}, Vol 530:307 (2016)). It proposes an effective description of the emergent universal resilience patterns in complex networks for a very general class of pair-wise interaction dynamics. The procedure consists in collapsing the multi-dimensional system state variables (e.g. stationary populations) into a 1-dimensional effective equation that is analytically tractable. The validity of the collapse is assumed to hold depending on two main hypothesis: (i) The network determined by the the interaction between pairs of nodes has negligible degree correlations; (ii) The node activities are uniform across nodes on both the drift and pair-wise interaction functions. Moreover, the authors consider only positive (mutualistic) interactions. Here we show the conditions proposed by Gao and collaborators are neither sufficient nor necessary to guarantee that their method works in general, and validity of their results are not independent of the model chosen within the class of dynamics they considered. Indeed we find that a new condition poses effective limitations to their framework and we provide quantitative predictions of the quality of the one dimensional collapse as a function of the properties of the interaction networks and dynamics. At the same time, we show that multi-dimensional reduction may work also beyond the assumption of strictly mutualistic interactions, thus extending the validity of their framework. We prove our results for generalized Lotka-Volterra dynamics through both analytical and numerical evidences.

nonlinear-dynamics
approximation
rather-interesting
dimension-reduction
algorithms
horse-races
performance-measure
to-write-about
nudge-targets
consider:looking-to-see
december 2016 by Vaguery

A Novel Algorithm for Coarse-Graining of Cellular Automata - Springer

december 2016 by Vaguery

The coarse-graining is an approximation procedure widely used for simplification of mathematical and numerical models of multiscale systems. It reduces superfluous – microscopic – degrees of freedom. Israeli and Goldenfeld demonstrated in [1,2] that the coarse-graining can be employed for elementary cellular automata (CA), producing interesting interdependences between them. However, extending their investigation on more complex CA rules appeared to be impossible due to the high computational complexity of the coarse-graining algorithm. We demonstrate here that this complexity can be substantially decreased. It allows for scrutinizing much broader class of cellular automata in terms of their coarse graining. By using our algorithm we found out that the ratio of the numbers of elementary CAs having coarse grained representation to “degenerate” – irreducible – cellular automata, strongly increases with increasing the “grain” size of the approximation procedure. This rises principal questions about the formal limits in modeling of realistic multiscale systems.

cellular-automata
aggregation
approximation
to-do
to-write-about
via:dan-little
december 2016 by Vaguery

[1106.3779] Subsum Sets: Intervals, Cantor Sets, and Cantorvals

december 2016 by Vaguery

Given a sequence converging to zero, we consider the set of numbers which are sums of (infinite, finite, or empty) subsequences. When the original sequence is not absolutely summable, the subsum set is an unbounded closed interval which includes zero. When it is absolutely summable the subsum set is one of the following: a finite union of (nontrivial) compact intervals, a Cantor set, or a "symmetric Cantorval".

mathematics
rather-interesting
approximation
representation
to-write-about
december 2016 by Vaguery

[1605.03135] Adjoint-based Gradient Estimation Using the Space-time Solutions of Unknown Conservation Law Simulations

october 2016 by Vaguery

Many control applications can be formulated as optimization constrained by conservation laws. Such optimization can be efficiently solved by gradient-based methods, where the gradient is obtained through the adjoint method. Traditionally, the adjoint method has not been able to be implemented in "gray-box" conservation law simulations. In gray-box simulations, the analytical and numerical form of the conservation law is unknown, but the space-time solution of relevant flow quantities is available. Without the adjoint gradient, optimization can be challenging for problems with many control variables. However, much information about the gray-box simulation is contained in its space-time solution, which motivates us to estimate the adjoint gradient by leveraging the space-time solution. This article considers a type of gray-box simulations where the flux function is partially unknown. A method is introduced to estimate the adjoint gradient at a cost independent of the number of control variables. The method firstly infers a conservation law, named the twin model, from the space-time solution, and then applies the adjoint method to the inferred twin model to estimate the gradient. The method is demonstrated to achieve good gradient estimation accuracies in several numerical examples. The main contributions of this paper are: a twin model method that enables the adjoint gradient computation for gray-box conservation law simulations; and an adaptive basis construction scheme that fully exploits the information of gray-box solutions.

optimization
algorithms
approximation
constraint-satisfaction
rather-interesting
to-understand
representation
october 2016 by Vaguery

[1605.06444] Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes

october 2016 by Vaguery

In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost-function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare - but extremely dense and accessible - regions of configurations in the network weight space. We define a novel measure, which we call the "robust ensemble" (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models, and also provide a general algorithmic scheme which is straightforward to implement: define a cost-function given by a sum of a finite number of replicas of the original cost-function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful new algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.

neural-networks
update
machine-learning
approximation
rather-interesting
philosophy-of-engineering
to-write-about
october 2016 by Vaguery

[1609.05034] What You Will Gain By Rounding: Theory and Algorithms for Rounding Rank

october 2016 by Vaguery

When analyzing discrete data such as binary matrices using matrix factorizations, we often have to make a choice between using expensive combinatorial methods that retain the discrete nature of the data and using continuous methods that can be more efficient but destroy the discrete structure. Alternatively, we can first compute a continuous factorization and subsequently apply a rounding procedure to obtain a discrete representation. But what will we gain by rounding? Will this yield lower reconstruction errors? Is it easy to find a low-rank matrix that rounds to a given binary matrix? Does it matter which threshold we use for rounding? Does it matter if we allow for only non-negative factorizations? In this paper, we approach these and further questions by presenting and studying the concept of rounding rank. We show that rounding rank is related to linear classification, dimensionality reduction, and nested matrices. We also report on an extensive experimental study that compares different algorithms for finding good factorizations under the rounding rank model.

approximation
matrices
numerical-methods
algorithms
rather-interesting
nudge-targets
consider:looking-to-see
october 2016 by Vaguery

[1605.00562] Persistent homology of time-dependent functional networks constructed from coupled time series

september 2016 by Vaguery

We use topological data analysis to study "functional networks" that we construct from time-series data from both experimental and synthetic sources. We use persistent homology in combination with a weight rank clique filtration to gain insights into these functional networks, and we use persistence landscapes to interpret our results. Our first example consists of biological data in the form of functional magnetic resonance imaging (fMRI) data that was acquired from human subjects during a simple motor-learning task. Our second example uses time-series output from networks of coupled Kuramoto oscillators. With these examples, we demonstrate that (1) using persistent homology to study functional networks provides fascinating insights into their properties and (2) the position of the features in a filtration can play a more vital role than persistence in the interpretation of topological features, even though conventionally the latter is used to distinguish between signal and noise. We find that persistent homology can detect differences in synchronization patterns in our data sets over time, giving insight both on changes in community structure in the networks and on increased synchronization between brain regions that form loops in a functional network during motor learning. For the motor-learning data, persistence landscapes also reveal that on average the majority of changes in the network loops takes place on the second of three days of the learning process.

nonlinear-dynamics
time-series
approximation
clustering
rather-interesting
feature-extraction
nudge-targets
consider:performance-measures
consider:representation
reminds-me-of-Wim's-paper
september 2016 by Vaguery

[1608.08225] Why does deep and cheap learning work so well?

august 2016 by Vaguery

We show how the success of deep learning depends not only on mathematics but also on physics: although well-known mathematical theorems guarantee that neural networks can approximate arbitrary functions well, the class of functions of practical interest can be approximated through "cheap learning" with exponentially fewer parameters than generic ones, because they have simplifying properties tracing back to the laws of physics. The exceptional simplicity of physics-based functions hinges on properties such as symmetry, locality, compositionality and polynomial log-probability, and we explore how these properties translate into exceptionally simple neural networks approximating both natural phenomena such as images and abstract representations thereof such as drawings. We further argue that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine-learning, a deep neural network can be more efficient than a shallow one. We formalize these claims using information theory and discuss the relation to renormalization group procedures. Various "no-flattening theorems" show when these efficient deep networks cannot be accurately approximated by shallow ones without efficiency loss - even for linear networks.

machine-learning
approximation
information-theory
rather-interesting
to-write-about
via:many
august 2016 by Vaguery

[1606.01241] A Better Method for Volume Determination of Regularly and Irregularly Shaped Nanoparticles with Enhanced Accuracy

july 2016 by Vaguery

Nanoparticles (NPs) are widely used in diverse application areas, such as medicine, engineering, and cosmetics. The size (or volume) of NPs is one of the most important parameters for their successful application. It is relatively straightforward to determine the volume of regular NPs such as spheres and cubes from a one-dimensional or two-dimensional measurement. However, due to the three-dimensional nature of NPs, it is challenging to determine the proper physical size of many types of regularly and irregularly-shaped NPs (IS-NPs) at high-throughput using a single tool. Here, we present a relatively simple method that statistically determines a better volume estimate of many types of NPs by combining measurements from their top-down projection areas and peak-heights using two tools. The proposed method is significantly faster and more economical than the electron tomography method. We demonstrate the improved accuracy of the combined method over scanning electron microscopy (SEM) and atomic force microscopy (AFM) by using both modeling and measurements. This study also shows that SEM provides a more accurate estimate of size than AFM for most IS-NP size measurements. The method provides a much needed, proper high-throughput volumetric measurement method useful for many applications.

measurement
approximation
rather-interesting
metrology
experiment
nanotechnology
engineering
nudge-targets
consider:looking-to-see
algorithms
july 2016 by Vaguery

[1605.06396] Soft Covering with High Probability

july 2016 by Vaguery

Wyner's soft-covering lemma is the central analysis step for achievability proofs of information theoretic security, resolvability, and channel synthesis. It can also be used for simple achievability proofs in lossy source coding. This work sharpens the claim of soft-covering by moving away from an expected value analysis. Instead, a random codebook is shown to achieve the soft-covering phenomenon with high probability. The probability of failure is super-exponentially small in the block-length, enabling many applications through the union bound. This work gives bounds for both the exponential decay rate of total variation and the second-order codebook rate for soft covering.

information-theory
looking-to-see
rather-interesting
heuristics
approximation
nudge-targets
consider:looking-to-see
consider:feature-discovery
july 2016 by Vaguery

[1607.05712] Structure-Blind Signal Recovery

july 2016 by Vaguery

We consider the problem of recovering a signal observed in Gaussian noise. If the set of signals is convex and compact, and can be specified beforehand, one can use classical linear estimators that achieve a risk within a constant factor of the minimax risk. However, when the set is unspecified, designing an estimator that is blind to the hidden structure of the signal remains a challenging problem. We propose a new family of estimators to recover signals observed in Gaussian noise. Instead of specifying the set where the signal lives, we assume the existence of a well-performing linear estimator. Proposed estimators enjoy exact oracle inequalities and can be efficiently computed through convex optimization. We present several numerical illustrations that show the potential of the approach.

signal-processing
image-processing
algorithms
inference
rather-interesting
approximation
nudge-targets
consider:looking-to-see
july 2016 by Vaguery

[1411.1420] Basis Learning as an Algorithmic Primitive

july 2016 by Vaguery

A number of important problems in theoretical computer science and machine learning can be interpreted as recovering a certain basis. These include symmetric matrix eigendecomposition, certain tensor decompositions, Independent Component Analysis (ICA), spectral clustering and Gaussian mixture learning. Each of these problems reduces to an instance of our general model, which we call a "Basis Encoding Function" (BEF). We show that learning a basis within this model can then be provably and efficiently achieved using a first order iteration algorithm (gradient iteration). Our algorithm goes beyond tensor methods while generalizing a number of existing algorithms---e.g., the power method for symmetric matrices, the tensor power iteration for orthogonal decomposable tensors, and cumulant-based FastICA---all within a broader function-based dynamical systems framework. Our framework also unifies the unusual phenomenon observed in these domains that they can be solved using efficient non-convex optimization. Specifically, we describe a class of BEFs such that their local maxima on the unit sphere are in one-to-one correspondence with the basis elements. This description relies on a certain "hidden convexity" property of these functions.

We provide a complete theoretical analysis of the gradient iteration even when the BEF is perturbed. We show convergence and complexity bounds polynomial in dimension and other relevant parameters, such as perturbation size. Our perturbation results can be considered as a non-linear version of the classical Davis-Kahan theorem for perturbations of eigenvectors of symmetric matrices. In addition we show that our algorithm exhibits fast (superlinear) convergence and relate the speed of convergence to the properties of the BEF. Moreover, the gradient iteration algorithm can be easily and efficiently implemented in practice.

machine-learning
approximation
gradient-descent
representation
algorithms
nudge-targets
consider:representation
reasonable-approach
We provide a complete theoretical analysis of the gradient iteration even when the BEF is perturbed. We show convergence and complexity bounds polynomial in dimension and other relevant parameters, such as perturbation size. Our perturbation results can be considered as a non-linear version of the classical Davis-Kahan theorem for perturbations of eigenvectors of symmetric matrices. In addition we show that our algorithm exhibits fast (superlinear) convergence and relate the speed of convergence to the properties of the BEF. Moreover, the gradient iteration algorithm can be easily and efficiently implemented in practice.

july 2016 by Vaguery

[1606.09333] Dimension-Free Iteration Complexity of Finite Sum Optimization Problems

july 2016 by Vaguery

Many canonical machine learning problems boil down to a convex optimization problem with a finite sum structure. However, whereas much progress has been made in developing faster algorithms for this setting, the inherent limitations of these problems are not satisfactorily addressed by existing lower bounds. Indeed, current bounds focus on first-order optimization algorithms, and only apply in the often unrealistic regime where the number of iterations is less than (d/n) (where d is the dimension and n is the number of samples). In this work, we extend the framework of (Arjevani et al., 2015) to provide new lower bounds, which are dimension-free, and go beyond the assumptions of current bounds, thereby covering standard finite sum optimization methods, e.g., SAG, SAGA, SVRG, SDCA without duality, as well as stochastic coordinate-descent methods, such as SDCA and accelerated proximal SDCA.

machine-learning
approximation
rather-interesting
optimization
gradient-descent
algorithms
computational-complexity
nudge-targets
consider:looking-to-see
july 2016 by Vaguery

[1606.00901] Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

july 2016 by Vaguery

The approximate message passing (AMP) algorithm shows advantage over conventional convex optimization methods in recovering under-sampled sparse signals. AMP is analytically tractable and has a much lower complexity. However, it requires that the true parameters of the input and output channels are known. In this paper, we propose an AMP algorithm with built-in parameter estimation that jointly estimates the sparse signals along with the parameters by treating them as unknown random variables with simple priors. Specifically, the maximum a posterior (MAP) parameter estimation is presented and shown to produce estimations that converge to the true parameter values. Experiments on sparse signal recovery show that the performance of the proposed approach matches that of the oracle AMP algorithm where the true parameter values are known.

compressed-sensing
information-theory
signal-processing
algorithms
inference
nudge-targets
approximation
rather-interesting
consider:feature-discovery
consider:robustness
july 2016 by Vaguery

Tradict enables high fidelity reconstruction of the eukaryotic transcriptome from 100 marker genes | bioRxiv

july 2016 by Vaguery

Transcript levels are a critical determinant of the proteome and hence cellular function. Because the transcriptome is an outcome of the interactions between genes and their products, we reasoned it might be accurately represented by a subset of transcript abundances. We develop a method, Tradict (transcriptome predict), capable of learning and using the expression measurements of a small subset of 100 marker genes to reconstruct entire transcriptomes. By analyzing over 23,000 publicly available RNA-Seq datasets, we show that Tradict is robust to noise and accurate, especially for predicting the expression of a comprehensive, but interpretable list of transcriptional programs that represent the major biological processes and cellular pathways. Coupled with targeted RNA sequencing, Tradict may therefore enable simultaneous transcriptome-wide screening and mechanistic investigation at large scales. Thus, whether for performing forward genetic, chemogenomic, or agricultural screens or for profiling single-cells, Tradict promises to help accelerate genetic dissection and drug discovery.

bioinformatics
algorithms
statistics
models
rather-interesting
systems-biology
nudge-targets
approximation
consider:feature-discovery
july 2016 by Vaguery

[1509.07892] Evasion and Hardening of Tree Ensemble Classifiers

july 2016 by Vaguery

Classifier evasion consists in finding for a given instance x the nearest instance x′ such that the classifier predictions of x and x′ are different. We present two novel algorithms for systematically computing evasions for tree ensembles such as boosted trees and random forests. Our first algorithm uses a Mixed Integer Linear Program solver and finds the optimal evading instance under an expressive set of constraints. Our second algorithm trades off optimality for speed by using symbolic prediction, a novel algorithm for fast finite differences on tree ensembles. On a digit recognition task, we demonstrate that both gradient boosted trees and random forests are extremely susceptible to evasions. Finally, we harden a boosted tree model without loss of predictive accuracy by augmenting the training set of each boosting round with evading instances, a technique we call adversarial boosting.

classification
machine-learning
approximation
robustness
rather-interesting
nudge-targets
feature-construction
july 2016 by Vaguery

**related tags**

Copy this bookmark: