Vaguery + approximation   251

[1806.10909] ResNet with one-neuron hidden layers is a Universal Approximator
We demonstrate that a very deep ResNet with stacked modules with one neuron per hidden layer and ReLU activation functions can uniformly approximate any Lebesgue integrable function in d dimensions, i.e. ℓ1(ℝd). Because of the identity mapping inherent to ResNets, our network has alternating layers of dimension one and d. This stands in sharp contrast to fully connected networks, which are not universal approximators if their width is the input dimension d [Lu et al, 2017]. Hence, our result implies an increase in representational power for narrow deep networks by the ResNet architecture.
representation  neural-networks  approximation  rather-interesting  to-write-about  to-do 
7 weeks ago by Vaguery
[1712.08373] Notes on complexity of packing coloring
A packing k-coloring for some integer k of a graph G=(V,E) is a mapping
φ:V→{1,…,k} such that any two vertices u,v of color φ(u)=φ(v) are in distance at least φ(u)+1. This concept is motivated by frequency assignment problems. The \emph{packing chromatic number} of G is the smallest k such that there exists a packing k-coloring of G.
Fiala and Golovach showed that determining the packing chromatic number for chordal graphs is \NP-complete for diameter exactly 5. While the problem is easy to solve for diameter 2, we show \NP-completeness for any diameter at least 3. Our reduction also shows that the packing chromatic number is hard to approximate within n1/2−ε for any ε>0.
In addition, we design an \FPT algorithm for interval graphs of bounded diameter. This leads us to exploring the problem of finding a partial coloring that maximizes the number of colored vertices.
graph-theory  algorithms  combinatorics  proof  approximation  nudge-targets  consider:looking-to-see  consider:feature-discovery 
march 2018 by Vaguery
[1801.00548] A Machine Learning Approach to Adaptive Covariance Localization
Data assimilation plays a key role in large-scale atmospheric weather forecasting, where the state of the physical system is estimated from model outputs and observations, and is then used as initial condition to produce accurate future forecasts. The Ensemble Kalman Filter (EnKF) provides a practical implementation of the statistical solution of the data assimilation problem and has gained wide popularity as. This success can be attributed to its simple formulation and ease of implementation. EnKF is a Monte-Carlo algorithm that solves the data assimilation problem by sampling the probability distributions involved in Bayes theorem. Because of this, all flavors of EnKF are fundamentally prone to sampling errors when the ensemble size is small. In typical weather forecasting applications, the model state space has dimension 109−1012, while the ensemble size typically ranges between 30−100 members. Sampling errors manifest themselves as long-range spurious correlations and have been shown to cause filter divergence. To alleviate this effect covariance localization dampens spurious correlations between state variables located at a large distance in the physical space, via an empirical distance-dependent function. The quality of the resulting analysis and forecast is greatly influenced by the choice of the localization function parameters, e.g., the radius of influence. The localization radius is generally tuned empirically to yield desirable results.This work, proposes two adaptive algorithms for covariance localization in the EnKF framework, both based on a machine learning approach. The first algorithm adapts the localization radius in time, while the second algorithm tunes the localization radius in both time and space. Numerical experiments carried out with the Lorenz-96 model, and a quasi-geostrophic model, reveal the potential of the proposed machine learning approaches.
modeling  machine-learning  prediction  rather-interesting  looking-to-see  approximation  algorithms  to-write-about 
march 2018 by Vaguery
[1709.08004] Slow-scale split-step tau-leap method for stiff stochastic chemical systems
Tau-leaping is a family of algorithms for the approximate simulation of discrete state continuous time Markov chains. The motivation for the development of such methods can be found, for instance, in the fields of chemical kinetics and systems biology. It is well known that the dynamical behavior of biochemical systems is often intrinsically stiff representing a serious challenge for their numerical approximation. The naive extension of stiff deterministic solvers to stochastic integration usually yields numerical solutions with either impractically large relaxation times or incorrectly resolved covariance. In this paper, we propose a novel splitting heuristic which allows to resolve these issues. The proposed numerical integrator takes advantage of the special structure of the linear systems with explicitly available equations for the mean and the covariance which we use to calibrate the parameters of the scheme. It is shown that the method is able to reproduce the exact mean and variance of the linear scalar test equation and has very good accuracy for the arbitrarily stiff systems at least in linear case. The numerical examples for both linear and nonlinear systems are also provided and the obtained results confirm the efficiency of the considered splitting approach.
numerical-methods  diffy-Qs  approximation  systems-biology  algorithms  to-write-about  consider:representation  rather-interesting  nudge 
february 2018 by Vaguery
A few shades of interpolation
The topic of this snapshot is interpolation. In the
ordinary sense, interpolation means to insert something
of a different nature into something else. In
mathematics, interpolation means constructing new
data points from given data points. The new points
usually lie in between the already-known points. The
purpose of this snapshot is to introduce a particular
type of interpolation, namely, polynomial interpolation.
This will be explained starting from basic ideas
that go back to the ancient Babylonians and Greeks,
and will arrive at subjects of current research activity.
interpolation  rather-interesting  history-of-science  algorithms  approximation  to-write-about 
february 2018 by Vaguery
[1709.10030] Sparse Hierarchical Regression with Polynomials
We present a novel method for exact hierarchical sparse polynomial regression. Our regressor is that degree r polynomial which depends on at most k inputs, counting at most ℓ monomial terms, which minimizes the sum of the squares of its prediction errors. The previous hierarchical sparse specification aligns well with modern big data settings where many inputs are not relevant for prediction purposes and the functional complexity of the regressor needs to be controlled as to avoid overfitting. We present a two-step approach to this hierarchical sparse regression problem. First, we discard irrelevant inputs using an extremely fast input ranking heuristic. Secondly, we take advantage of modern cutting plane methods for integer optimization to solve our resulting reduced hierarchical (k,ℓ)-sparse problem exactly. The ability of our method to identify all k relevant inputs and all ℓ monomial terms is shown empirically to experience a phase transition. Crucially, the same transition also presents itself in our ability to reject all irrelevant features and monomials as well. In the regime where our method is statistically powerful, its computational complexity is interestingly on par with Lasso based heuristics. The presented work fills a void in terms of a lack of powerful disciplined nonlinear sparse regression methods in high-dimensional settings. Our method is shown empirically to scale to regression problems with n≈10,000 observations for input dimension p≈1,000.
statistics  regression  algorithms  approximation  performance-measure  to-understand  nudge-targets  consider:looking-to-see 
february 2018 by Vaguery
[1712.08558] Lattice-based Locality Sensitive Hashing is Optimal
Locality sensitive hashing (LSH) was introduced by Indyk and Motwani (STOC `98) to give the first sublinear time algorithm for the c-approximate nearest neighbor (ANN) problem using only polynomial space. At a high level, an LSH family hashes "nearby" points to the same bucket and "far away" points to different buckets. The quality of measure of an LSH family is its LSH exponent, which helps determine both query time and space usage.
In a seminal work, Andoni and Indyk (FOCS `06) constructed an LSH family based on random ball partitioning of space that achieves an LSH exponent of 1/c^2 for the l_2 norm, which was later shown to be optimal by Motwani, Naor and Panigrahy (SIDMA `07) and O'Donnell, Wu and Zhou (TOCT `14). Although optimal in the LSH exponent, the ball partitioning approach is computationally expensive. So, in the same work, Andoni and Indyk proposed a simpler and more practical hashing scheme based on Euclidean lattices and provided computational results using the 24-dimensional Leech lattice. However, no theoretical analysis of the scheme was given, thus leaving open the question of finding the exponent of lattice based LSH.
In this work, we resolve this question by showing the existence of lattices achieving the optimal LSH exponent of 1/c^2 using techniques from the geometry of numbers. At a more conceptual level, our results show that optimal LSH space partitions can have periodic structure. Understanding the extent to which additional structure can be imposed on these partitions, e.g. to yield low space and query complexity, remains an important open problem.
algorithms  numerical-methods  rather-interesting  approximation  computational-complexity  nudge-targets  consider:looking-to-see  consider:representation 
january 2018 by Vaguery
[1605.04679] Typical Performance of Approximation Algorithms for NP-hard Problems
Typical performance of approximation algorithms is studied for randomized minimum vertex cover problems. A wide class of random graph ensembles characterized by an arbitrary degree distribution is discussed with some theoretical frameworks. Here three approximation algorithms are examined; the linear-programming relaxation, the loopy-belief propagation, and the leaf-removal algorithm. The former two algorithms are analyzed using the statistical-mechanical technique while the average-case analysis of the last one is studied by the generating function method. These algorithms have a threshold in the typical performance with increasing the average degree of the random graph, below which they find true optimal solutions with high probability. Our study reveals that there exist only three cases determined by the order of the typical-performance thresholds. We provide some conditions for classifying the graph ensembles and demonstrate explicitly examples for the difference in the threshold.
approximation  computational-complexity  rather-interesting  to-write-about  to-do  nudge-targets  consider:approximation 
december 2017 by Vaguery
[1708.04215] A Constant-Factor Approximation Algorithm for the Asymmetric Traveling Salesman Problem
We give a constant-factor approximation algorithm for the asymmetric traveling salesman problem. Our approximation guarantee is analyzed with respect to the standard LP relaxation, and thus our result confirms the conjectured constant integrality gap of that relaxation.
Our techniques build upon the constant-factor approximation algorithm for the special case of node-weighted metrics. Specifically, we give a generic reduction to structured instances that resemble but are more general than those arising from node-weighted metrics. For those instances, we then solve Local-Connectivity ATSP, a problem known to be equivalent (in terms of constant-factor approximation) to the asymmetric traveling salesman problem.
algorithms  via:vielmetti  operations-research  approximation  optimization  computational-complexity  to-write-about  nudge-targets  consider:looking-to-see 
december 2017 by Vaguery
[1711.02724] Algorithms to Approximate Column-Sparse Packing Problems
Column-sparse packing problems arise in several contexts in both deterministic and stochastic discrete optimization. We present two unifying ideas, (non-uniform) attenuation and multiple-chance algorithms, to obtain improved approximation algorithms for some well-known families of such problems. As three main examples, we attain the integrality gap, up to lower-order terms, for known LP relaxations for k-column sparse packing integer programs (Bansal et al., Theory of Computing, 2012) and stochastic k-set packing (Bansal et al., Algorithmica, 2012), and go "half the remaining distance" to optimal for a major integrality-gap conjecture of Furedi, Kahn and Seymour on hypergraph matching (Combinatorica, 1993).
integer-programming  matrices  optimization  operations-research  set-theory  hypergraphs  to-write-about  approximation  algorithms 
november 2017 by Vaguery
[1710.00228] Physics-based Motion Planning: Evaluation Criteria and Benchmarking
Motion planning has evolved from coping with simply geometric problems to physics-based ones that incorporate the kinodynamic and the physical constraints imposed by the robot and the physical world. Therefore, the criteria for evaluating physics-based motion planners goes beyond the computational complexity (e.g. in terms of planning time) usually used as a measure for evaluating geometrical planners, in order to consider also the quality of the solution in terms of dynamical parameters. This study proposes an evaluation criteria and analyzes the performance of several kinodynamic planners, which are at the core of physics-based motion planning, using different scenarios with fixed and manipulatable objects. RRT, EST, KPIECE and SyCLoP are used for the benchmarking. The results show that KPIECE computes the time-optimal solution with heighest success rate, whereas, SyCLoP compute the most power-optimal solution among the planners used.
simulation  planning  algorithms  rather-interesting  feature-construction  approximation  nudge-targets  consider:representation 
october 2017 by Vaguery
[1705.04587] Complexity and Inapproximability Results for Parallel Task Scheduling and Strip Packing
We study the Parallel Task Scheduling problem Pm|sizej|Cmax with a constant number of machines. This problem is known to be strongly NP-complete for each m≥5, while it is solvable in pseudo-polynomial time for each m≤3. We give a positive answer to the long-standing open question whether this problem is strongly NP-complete for m=4. As a second result, we improve the lower bound of 1211 for approximating pseudo-polynomial Strip Packing to 54. Since the best known approximation algorithm for this problem has a ratio of 43+ε, this result narrows the gap between approximation ratio and inapproximability result by a significant step. Both results are proven by a reduction from the strongly NP-complete problem 3-Partition.
operations-research  optimization  computational-complexity  benchmarking  approximation  rather-interesting  nudge-targets  consider:approximation 
october 2017 by Vaguery
[1401.6200] Three Series for the Generalized Golden Mean
As is well-known, the ratio of adjacent Fibonacci numbers tends to phi = (1 + sqrt(5))/2, and the ratio of adjacent Tribonacci numbers (where each term is the sum of the three preceding numbers) tends to the real root eta of X^3 - X^2 - X - 1 = 0. Letting alpha(n) denote the corresponding ratio for the generalized Fibonacci numbers, where each term is the sum of the n preceding, we obtain rapidly converging series for alpha(n), 1/alpha(n), and 1/(2-alpha(n)).
number-theory  approximation  rather-interesting  series  convergence  nudge-targets  consider:evolving-some  consider:performance-measures  consider:convergence  consider:complexity 
october 2017 by Vaguery
[1608.08225] Why does deep and cheap learning work so well?
We show how the success of deep learning could depend not only on mathematics but also on physics: although well-known mathematical theorems guarantee that neural networks can approximate arbitrary functions well, the class of functions of practical interest can frequently be approximated through "cheap learning" with exponentially fewer parameters than generic ones. We explore how properties frequently encountered in physics such as symmetry, locality, compositionality, and polynomial log-probability translate into exceptionally simple neural networks. We further argue that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine-learning, a deep neural network can be more efficient than a shallow one. We formalize these claims using information theory and discuss the relation to the renormalization group. We prove various "no-flattening theorems" showing when efficient linear deep networks cannot be accurately approximated by shallow ones without efficiency loss, for example, we show that n variables cannot be multiplied using fewer than 2^n neurons in a single hidden layer.
deep-learning  machine-learning  approximation  looking-to-see  algorithms  representation  rather-interesting  learning-algorithms  information-theory  to-write-about 
october 2017 by Vaguery
[1610.07277] Rapid calculation of side chain packing and free energy with applications to protein molecular dynamics
To address the large gap between time scales that can be easily reached by molecular simulations and those required to understand protein dynamics, we propose a rapid self-consistent approximation of the side chain free energy at every integration step. In analogy with the adiabatic Born-Oppenheimer approximation for electronic structure, the protein backbone dynamics are simulated as preceding according to the dictates of the free energy of an instantaneously-equilibrated side chain potential. The side chain free energy is computed on the fly, allowing the protein backbone dynamics to traverse a greatly smoothed energetic landscape. This results in extremely rapid equilibration and sampling of the Boltzmann distribution. Because our method employs a reduced model involving single-bead side chains, we also provide a novel, maximum-likelihood method to parameterize the side chain model using input data from high resolution protein crystal structures. We demonstrate state-of-the-art accuracy for predicting χ1 rotamer states while consuming only milliseconds of CPU time. We also show that the resulting free energies of side chains is sufficiently accurate for de novo folding of some proteins.
structural-biology  approximation  heuristics  algorithms  protein-folding  to-write-about  nudge-targets 
october 2017 by Vaguery
[1701.05290] Range-efficient consistent sampling and locality-sensitive hashing for polygons
Locality-sensitive hashing (LSH) is a fundamental technique for similarity search and similarity estimation in high-dimensional spaces. The basic idea is that similar objects should produce hash collisions with probability significantly larger than objects with low similarity. We consider LSH for objects that can be represented as point sets in either one or two dimensions. To make the point sets finite size we consider the subset of points on a grid. Directly applying LSH (e.g. min-wise hashing) to these point sets would require time proportional to the number of points. We seek to achieve time that is much lower than direct approaches.
Technically, we introduce new primitives for range-efficient consistent sampling (of independent interest), and show how to turn such samples into LSH values. Another application of our technique is a data structure for quickly estimating the size of the intersection or union of a set of preprocessed polygons. Curiously, our consistent sampling method uses transformation to a geometric problem.
rather-interesting  approximation  representation  to-understand  nudge  consider:for-behavior-classification  to-write-about  computational-geometry 
september 2017 by Vaguery
[1705.01991] Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU
Attentional sequence-to-sequence models have become the new standard for machine translation, but one challenge of such models is a significant increase in training and decoding cost compared to phrase-based systems. Here, we focus on efficient decoding, with a goal of achieving accuracy close the state-of-the-art in neural machine translation (NMT), while achieving CPU decoding speed/throughput close to that of a phrasal decoder.
We approach this problem from two angles: First, we describe several techniques for speeding up an NMT beam search decoder, which obtain a 4.4x speedup over a very efficient baseline decoder without changing the decoder output. Second, we propose a simple but powerful network architecture which uses an RNN (GRU/LSTM) layer at bottom, followed by a series of stacked fully-connected layers applied at every timestep. This architecture achieves similar accuracy to a deep recurrent model, at a small fraction of the training and decoding cost. By combining these techniques, our best system achieves a very competitive accuracy of 38.3 BLEU on WMT English-French NewsTest2014, while decoding at 100 words/sec on single-threaded CPU. We believe this is the best published accuracy/speed trade-off of an NMT system.
machine-learning  representation  algorithms  deep-learning  neural-networks  approximation  nudge-targets  consider:performance-measures  consider:representation  natural-language-processing 
september 2017 by Vaguery
Approximately Yours | bit-player
Today, I’m told, is Rational Approximation Day. It’s 22/7 (for those who write dates in little-endian format), which differs from π by about 0.04 percent. (The big-endians among us are welcome to approximate 1/π.)
Given the present state of life in America, what we really need is an Approximation to Rationality Day, but that may have to wait for 20/1/21. In the meantime, let us merrily fiddle with numbers, searching for ratios of integers that brazenly invade the personal space of famous irrationals.
mathematical-recreations  approximation  continued-fractions  to-write-about  algorithms  nudge-targets 
september 2017 by Vaguery
[1108.2714] Approximate common divisors via lattices
We analyze the multivariate generalization of Howgrave-Graham's algorithm for the approximate common divisor problem. In the m-variable case with modulus N and approximate common divisor of size N^beta, this improves the size of the error tolerated from N^(beta^2) to N^(beta^((m+1)/m)), under a commonly used heuristic assumption. This gives a more detailed analysis of the hardness assumption underlying the recent fully homomorphic cryptosystem of van Dijk, Gentry, Halevi, and Vaikuntanathan. While these results do not challenge the suggested parameters, a 2^(n^epsilon) approximation algorithm with epsilon<2/3 for lattice basis reduction in n dimensions could be used to break these parameters. We have implemented our algorithm, and it performs better in practice than the theoretical analysis suggests.
Our results fit into a broader context of analogies between cryptanalysis and coding theory. The multivariate approximate common divisor problem is the number-theoretic analogue of multivariate polynomial reconstruction, and we develop a corresponding lattice-based algorithm for the latter problem. In particular, it specializes to a lattice-based list decoding algorithm for Parvaresh-Vardy and Guruswami-Rudra codes, which are multivariate extensions of Reed-Solomon codes. This yields a new proof of the list decoding radii for these codes.
algorithms  number-theory  rather-interesting  to-write-about  to-understand  nudge-targets  consider:rediscovery  approximation  matrices 
september 2017 by Vaguery
[1512.00906] Butcher series: A story of rooted trees and numerical methods for evolution equations
Butcher series appear when Runge-Kutta methods for ordinary differential equations are expanded in power series of the step size parameter. Each term in a Butcher series consists of a weighted elementary differential, and the set of all such differentials is isomorphic to the set of rooted trees, as noted by Cayley in the mid 19th century. A century later Butcher discovered that rooted trees can also be used to obtain the order conditions of Runge-Kutta methods, and he found a natural group structure, today known as the Butcher group. It is now known that many numerical methods also can be expanded in Butcher series; these are called B-series methods. A long-standing problem has been to characterize, in terms of qualitative features, all B-series methods. Here we tell the story of Butcher series, stretching from the early work of Cayley, to modern developments and connections to abstract algebra, and finally to the resolution of the characterization problem. This resolution introduces geometric tools and perspectives to an area traditionally explored using analysis and combinatorics.
numerical-methods  differential-equations  symbolic-methods  algebra  mathematics  approximation  rather-interesting  to-write-about  consider:rediscovery 
september 2017 by Vaguery
[1707.06300] Untangling the hairball: fitness based asymptotic reduction of biological networks
Complex mathematical models of interaction networks are routinely used for prediction in systems biology. However, it is difficult to reconcile network complexities with a formal understanding of their behavior. Here, we propose a simple procedure (called φ¯) to reduce biological models to functional submodules, using statistical mechanics of complex systems combined with a fitness-based approach inspired by in silico evolution. φ¯ works by putting parameters or combination of parameters to some asymptotic limit, while keeping (or slightly improving) the model performance, and requires parameter symmetry breaking for more complex models. We illustrate φ¯ on biochemical adaptation and on different models of immune recognition by T cells. An intractable model of immune recognition with close to a hundred individual transition rates is reduced to a simple two-parameter model. φ¯ extracts three different mechanisms for early immune recognition, and automatically discovers similar functional modules in different models of the same process, allowing for model classification and comparison. Our procedure can be applied to biological networks based on rate equations using a fitness function that quantifies phenotypic performance.
systems-biology  approximation  simplification  network-theory  representation  rather-interesting  algorithms  theoretical-biology  philosophy-of-science  to-write-about 
august 2017 by Vaguery
How to Build an Impossible Polyhedron
Using stiff paper and transparent tape, Craig Kaplan assembles a beautiful roundish shape that looks like a Buckminster Fuller creation or a fancy new kind of soccer ball. It consists of four regular dodecagons (12-sided polygons with all angles and sides the same) and 12 decagons (10-sided), with 28 little gaps in the shape of equilateral triangles. There’s just one problem. This figure should be impossible. That set of polygons won’t meet at the vertices. The shape can’t close up.
via:my  markup  mathematics  approximation  rather-interesting  number-theory  open-questions  to-write-about  theory-and-practice-sitting-in-a-tree  philosophy-of-science 
june 2017 by Vaguery
[1703.01350] Approximate Convex Hulls
We investigate the PPI algorithm as a means for computing ap- proximate convex hull. We explain how the algorithm computes the curvature of points and prove consistency and convergence. We also extend the algorithm to compute approximate convex hulls described in terms of hyperplanes.
computational-geometry  approximation  algorithms  rather-interesting  to-write-about 
may 2017 by Vaguery
[1610.01983] Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks?
Deep learning has rapidly transformed the state of the art algorithms used to address a variety of problems in computer vision and robotics. These breakthroughs have relied upon massive amounts of human annotated training data. This time consuming process has begun impeding the progress of these deep learning efforts. This paper describes a method to incorporate photo-realistic computer images from a simulation engine to rapidly generate annotated data that can be used for the training of machine learning algorithms. We demonstrate that a state of the art architecture, which is trained only using these synthetic annotations, performs better than the identical architecture trained on human annotated real-world data, when tested on the KITTI data set for vehicle detection. By training machine learning algorithms on a rich virtual world, real objects in real scenes can be learned and classified using synthetic data. This approach offers the possibility of accelerating deep learning's application to sensor-based classification problems like those that appear in self-driving cars. The source code and data to train and validate the networks described in this paper are made available for researchers.
simulation  engineering-design  ontology  approximation  training  to-write-about  consider:stress-testing 
may 2017 by Vaguery
[1608.08844] Snapping Graph Drawings to the Grid Optimally
In geographic information systems and in the production of digital maps for small devices with restricted computational resources one often wants to round coordinates to a rougher grid. This removes unnecessary detail and reduces space consumption as well as computation time. This process is called snapping to the grid and has been investigated thoroughly from a computational-geometry perspective. In this paper we investigate the same problem for given drawings of planar graphs under the restriction that their combinatorial embedding must be kept and edges are drawn straight-line. We show that the problem is NP-hard for several objectives and provide an integer linear programming formulation. Given a plane graph G and a positive integer w, our ILP can also be used to draw G straight-line on a grid of width w and minimum height (if possible).
graph-layout  computational-geometry  algorithms  rather-interesting  constraint-satisfaction  nudge-targets  consider:looking-to-see  approximation  performance-measure 
may 2017 by Vaguery
[1603.05691] Do Deep Convolutional Nets Really Need to be Deep and Convolutional?
Yes, they do. This paper provides the first empirical demonstration that deep convolutional models really need to be both deep and convolutional, even when trained with methods such as distillation that allow small or shallow models of high accuracy to be trained. Although previous research showed that shallow feed-forward nets sometimes can learn the complex functions previously learned by deep nets while using the same number of parameters as the deep models they mimic, in this paper we demonstrate that the same methods cannot be used to train accurate models on CIFAR-10 unless the student models contain multiple layers of convolution. Although the student models do not have to be as deep as the teacher model they mimic, the students need multiple convolutional layers to learn functions of comparable accuracy as the deep convolutional teacher.
good-question  neural-networks  deep-learning  rather-interesting  approximation  heuristics  nudge-targets  consider:re-approximation 
may 2017 by Vaguery
[1011.4245] When the optimal is not the best: parameter estimation in complex biological models
Background: The vast computational resources that became available during the past decade enabled the development and simulation of increasingly complex mathematical models of cancer growth. These models typically involve many free parameters whose determination is a substantial obstacle to model development. Direct measurement of biochemical parameters in vivo is often difficult and sometimes impracticable, while fitting them under data-poor conditions may result in biologically implausible values.
Results: We discuss different methodological approaches to estimate parameters in complex biological models. We make use of the high computational power of the Blue Gene technology to perform an extensive study of the parameter space in a model of avascular tumor growth. We explicitly show that the landscape of the cost function used to optimize the model to the data has a very rugged surface in parameter space. This cost function has many local minima with unrealistic solutions, including the global minimum corresponding to the best fit.
Conclusions: The case studied in this paper shows one example in which model parameters that optimally fit the data are not necessarily the best ones from a biological point of view. To avoid force-fitting a model to a dataset, we propose that the best model parameters should be found by choosing, among suboptimal parameters, those that match criteria other than the ones used to fit the model. We also conclude that the model, data and optimization approach form a new complex system, and point to the need of a theory that addresses this problem more generally.
optimization  biophysics  noise  stochastic-resonance  modeling  rather-interesting  to-write-about  approximation  performance-measure 
may 2017 by Vaguery
[1507.07970v2] Dividing the circle
There are known constructions for some regular polygons, usually inscribed in a circle, but not for all polygons - the Gauss-Wantzel Theorem states precisely which ones can be constructed.
The constructions differ greatly from one polygon to the other. There are, however, general processes for determining the side of the n-gon (approximately, but sometimes with great precision), which we describe in this paper. We present a joint mathematical analysis of the so-called Bion and Tempier approximation methods, comparing the errors and trying to explain why these constructions would work at all.
compass-and-straightedge  plane-geometry  construction  approximation  nudge-targets  consider:rediscovery 
may 2017 by Vaguery
[1704.00708] No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis
In this paper we develop a new framework that captures the common landscape underlying the common non-convex low-rank matrix problems including matrix sensing, matrix completion and robust PCA. In particular, we show for all above problems (including asymmetric cases): 1) all local minima are also globally optimal; 2) no high-order saddle points exists. These results explain why simple algorithms such as stochastic gradient descent have global converge, and efficiently optimize these non-convex objective functions in practice. Our framework connects and simplifies the existing analyses on optimization landscapes for matrix sensing and symmetric matrix completion. The framework naturally leads to new results for asymmetric matrix completion and robust PCA.
compressed-sensing  matrices  optimization  approximation  rather-interesting  machine-learning  nudge-targets  consider:representation 
may 2017 by Vaguery
[1607.05527] An Approximation Algorithm for the Art Gallery Problem
Given a simple polygon  on n vertices, two points x,y in  are said to be visible to each other if the line segment between x and y is contained in . The Point Guard Art Gallery problem asks for a minimum set S such that every point in  is visible from a point in S. The set S is referred to as guards. Assuming integer coordinates and a specific general position assumption, we present the first O(logOPT)-approximation algorithm for the point guard problem for simple polygons. This algorithm combines ideas of a paper of Efrat and Har-Peled [Inf. Process. Lett. 2006] and Deshpande et. al. [WADS 2007]. We also point out a mistake in the latter.
computational-geometry  algorithms  approximation  nudge-targets 
april 2017 by Vaguery
[1307.0587] Self-Assembly on a Cylinder: A Model System for Understanding the Constraint of Commensurability
A crystal lattice, when confined to the surface of a cylinder, must have a periodic structure that is commensurate with the cylinder circumference. This constraint can frustrate the system, leading to oblique crystal lattices or to structures with a chiral seam known as a "line slip" phase, neither of which are stable for isotropic particles in equilibrium on flat surfaces. In this study, we use molecular dynamics simulations to find the steady-state structure of spherical particles with short-range repulsion and long-range attraction far below the melting temperature. We vary the range of attraction using the Lennard-Jones and Morse potentials and find that a shorter-range attraction favors the line-slip. We develop a simple model based only on geometry and bond energy to predict when the crystal or line-slip phases should appear, and find reasonable agreement with the simulations. The simplicity of this model allows us to understand the influence of the commensurability constraint, an understanding that might be extended into the more general problem of self-assembling particles in strongly confined spaces.
packing  tiling  approximation  looking-to-see  out-of-the-box  physics!  simulation  frustrated-systems  self-organization 
april 2017 by Vaguery
[1702.08328] A Universal Ordinary Differential Equation
An astonishing fact was established by Lee A. Rubel in 81: there exists a fixed non-trivial fourth-order polynomial differential algebraic equation (DAE) such that for any continuous function ϕ on the reals, and for any positive continuous function ϵ(t), it has a ∞ solution with |y(t)−ϕ(t)|<ϵ(t) for all t. Rubel provided an explicit example of such a polynomial DAE. More examples have later been proposed by other authors.
However, while these results may seem very surprising, their proofs are quite frustrating for a computability theorist. First, the constructed DAE have no unique solutions for a given initial data. This is very different from usual notions of universality since there is no unambiguous notion of evolution for a given initial data. Second, the proofs usually rely on solutions that are piecewise defined and sometimes non-constructive. Third, the proofs of these results can be interpreted more as the fact that polynomial algebraic differential equations is a too loose a model compared to classical ordinary differential equations. In particular, one may challenge whether the result is really a universality result.
The question whether one can require the solution that approximates ϕ to be the unique solution for a given initial data is a well known open problem [Rub81] (page 2), [boshernitzan1986universal] (Conjecture 6.2). In this article, we solve it and show that Rubel's statement holds for polynomial ordinary differential equations (ODEs), and since polynomial ODEs have a unique solution given an initial data, this positively answers Rubel's open problem. More precisely, we show that there exists a \textbf{fixed} polynomial ODE such that for any ϕ and ϵ there exists some initial condition that yields a solution that is ϵ-close to ϕ at all times.
diffyQ  nonlinear-dynamics  approximation  representation  rather-interesting  analytical-expressions  proof 
april 2017 by Vaguery
[1605.06857] Finding the Year's Share in Day-of-Week Calculations
The dominant part in the mental calculation of the day of the week for any given date is to determine the year share, that is, the contribution of the two-digit year part of the date. This paper describes a number of year share computation methods, some well-known and some new. The "Parity Minus 3" method, in particular, is a new alternative to the popular "Odd+11" method. The paper categorizes the methods of year share computation, and presents simpler proofs of their correctness than usually provided.
Keywords and phrases: day of the week, calendar algorithms, doomsday method, first Sunday algorithm, mental arithmetic, year share
mathematics  mnemonics  approximation  rather-interesting  mathematical-recreations  nudge-targets  consider:looking-to-see 
march 2017 by Vaguery
[1306.5527] Computing the Fr'echet Distance with a Retractable Leash
All known algorithms for the Fr\'echet distance between curves proceed in two steps: first, they construct an efficient oracle for the decision version; second, they use this oracle to find the optimum from a finite set of critical values. We present a novel approach that avoids the detour through the decision version. This gives the first quadratic time algorithm for the Fr\'echet distance between polygonal curves in Rd under polyhedral distance functions (e.g., L1 and L∞). We also get a (1+ε)-approximation of the Fr\'echet distance under the Euclidean metric, in quadratic time for any fixed ε>0. For the exact Euclidean case, our framework currently yields an algorithm with running time O(n2log2n). However, we conjecture that it may eventually lead to a faster exact algorithm.
metrics  algorithms  numerical-methods  rather-interesting  nudge-targets  approximation  consider:looking-to-see  consider:representation  to-write-about 
march 2017 by Vaguery
[1606.06660] Mapping polygons to the grid with small Hausdorff and Fr'echet distance
We show how to represent a simple polygon P by a grid (pixel-based) polygon Q that is simple and whose Hausdorff or Fr\'echet distance to P is small. For any simple polygon P, a grid polygon exists with constant Hausdorff distance between their boundaries and their interiors. Moreover, we show that with a realistic input assumption we can also realize constant Fr\'echet distance between the boundaries. We present algorithms accompanying these constructions, heuristics to improve their output while keeping the distance bounds, and experiments to assess the output.
rather-interesting  computational-geometry  approximation  metrics  performance-measure  nudge-targets  consider:looking-to-see 
march 2017 by Vaguery
[1603.06252] Grouping Time-varying Data for Interactive Exploration
We present algorithms and data structures that support the interactive analysis of the grouping structure of one-, two-, or higher-dimensional time-varying data while varying all defining parameters. Grouping structures characterise important patterns in the temporal evaluation of sets of time-varying data. We follow Buchin et al. [JoCG 2015] who define groups using three parameters: group-size, group-duration, and inter-entity distance. We give upper and lower bounds on the number of maximal groups over all parameter values, and show how to compute them efficiently. Furthermore, we describe data structures that can report changes in the set of maximal groups in an output-sensitive manner. Our results hold in ℝd for fixed d.
clustering  feature-construction  rather-interesting  to-understand  consider:performance-space-analysis  compare-to-Pareto-GP-features  visualization  approximation  to-write-about  algorithms  computational-geometry 
march 2017 by Vaguery
[1506.00990] Unsupervised Learning on Neural Network Outputs: with Application in Zero-shot Learning
The outputs of a trained neural network contain much richer information than just an one-hot classifier. For example, a neural network might give an image of a dog the probability of one in a million of being a cat but it is still much larger than the probability of being a car. To reveal the hidden structure in them, we apply two unsupervised learning algorithms, PCA and ICA, to the outputs of a deep Convolutional Neural Network trained on the ImageNet of 1000 classes. The PCA/ICA embedding of the object classes reveals their visual similarity and the PCA/ICA components can be interpreted as common visual features shared by similar object classes. For an application, we proposed a new zero-shot learning method, in which the visual features learned by PCA/ICA are employed. Our zero-shot learning method achieves the state-of-the-art results on the ImageNet of over 20000 classes.
machine-learning  approximation  neural-networks  classification  rather-interesting  to-write-about  like-the-Soft-Push-scheme  consider:looking-to-see  consider:feature-discovery 
march 2017 by Vaguery
[1703.01054] When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors
Finding similar user pairs is a fundamental task in social networks, with numerous applications in ranking and personalization tasks such as link prediction and tie strength detection. A common manifestation of user similarity is based upon network structure: each user is represented by a vector that represents the user's network connections, where pairwise cosine similarity among these vectors defines user similarity. The predominant task for user similarity applications is to discover all similar pairs that have a pairwise cosine similarity value larger than a given threshold τ. In contrast to previous work where τ is assumed to be quite close to 1, we focus on recommendation applications where τ is small, but still meaningful. The all pairs cosine similarity problem is computationally challenging on networks with billions of edges, and especially so for settings with small τ. To the best of our knowledge, there is no practical solution for computing all user pairs with, say τ=0.2 on large social networks, even using the power of distributed algorithms.
Our work directly addresses this challenge by introducing a new algorithm --- WHIMP --- that solves this problem efficiently in the MapReduce model. The key insight in WHIMP is to combine the "wedge-sampling" approach of Cohen-Lewis for approximate matrix multiplication with the SimHash random projection techniques of Charikar. We provide a theoretical analysis of WHIMP, proving that it has near optimal communication costs while maintaining computation cost comparable with the state of the art. We also empirically demonstrate WHIMP's scalability by computing all highly similar pairs on four massive data sets, and show that it accurately finds high similarity pairs. In particular, we note that WHIMP successfully processes the entire Twitter network, which has tens of billions of edges.
algorithms  numerical-methods  linear-algebra  metrics  approximation  nudge-targets  consider:representation 
march 2017 by Vaguery
[1701.07492] Path abstraction
Given the set of paths through a digraph, the result of uniformly deleting some vertices and identifying others along each path is coherent in such a way as to yield the set of paths through another digraph, called a \emph{path abstraction} of the original digraph. The construction of path abstractions is detailed and relevant basic results are established; generalizations are also discussed. Connections with random digraphs are also illustrated.
graph-theory  approximation  rather-odd  can't-wait-to-understand-this  wtf? 
march 2017 by Vaguery
[1410.0642] A Note on Archetypal Analysis and the Approximation of Convex Hulls
We briefly review the basic ideas behind archetypal analysis for matrix factorization and discuss its behavior in approximating the convex hull of a data sample. We then ask how good such approximations can be and consider different cases. Understanding archetypal analysis as the problem of computing a convexity constrained low-rank approximation of the identity matrix provides estimates for archetypal analysis and the SiVM heuristic.
archetypal-analysis  approximation  computational-geometry  rather-interesting  algorithms  to-write-about 
february 2017 by Vaguery
[1612.08515] Compositional Synthesis of Finite State Abstractions
Controller synthesis techniques for continuous systems with respect to temporal logic specifications typically use a finite-state symbolic abstraction of the system model. Constructing this abstraction for the entire system is computationally expensive, and does not exploit natural decompositions of many systems into interacting components. We describe a methodology for compositional symbolic abstraction to help scale controller synthesis for temporal logic to larger systems.
We introduce a new relation, called (approximate) disturbance bisimulation, as the basis for compositional symbolic abstractions. Disturbance bisimulation strengthens the standard approximate alternating bisimulation relation used in control, and extends naturally to systems which are composed of sub-components possibly connected in feedback; disturbance bisimulation handles the feedback signals as disturbances. After proving this composability of disturbance bisimulation for metric systems, we show how one can construct finite-state abstractions compositionally for each component, so that the abstractions are simultaneously disturbance bisimilar to their continuous counterparts. Combining these two results, we can compositionally abstract a network system in a modular way while ensuring that the final composed abstraction is distrubance bisimilar to the original system.
We discuss how we get a compositional controller synthesis methodology for networks of such systems against local temporal specifications as a by-product of our construction.
control-theory  automata  approximation  machine-learning  rather-interesting  to-understand  nudge-targets  consider:representation 
february 2017 by Vaguery
[1702.04917] Compressed sensing in Hilbert spaces
In many linear inverse problems, we want to estimate an unknown vector belonging to a high-dimensional (or infinite-dimensional) space from few linear measurements. To overcome the ill-posed nature of such problems, we use a low-dimension assumption on the unknown vector: it belongs to a low-dimensional model set. The question of whether it is possible to recover such an unknown vector from few measurements then arises. If the answer is yes, it is also important to be able to describe a way to perform such a recovery. We describe a general framework where appropriately chosen random measurements guarantee that recovery is possible. We further describe a way to study the performance of recovery methods that consist in the minimization of a regularization function under a data-fit constraint.
approximation  compressed-sensing  inference  modeling  algorithms 
february 2017 by Vaguery
[1701.03230] Surface Reconstruction with Data-driven Exemplar Priors
In this paper, we propose a framework to reconstruct 3D models from raw scanned points by learning the prior knowledge of a specific class of objects. Unlike previous work that heuristically specifies particular regularities and defines parametric models, our shape priors are learned directly from existing 3D models under a framework based on affinity propagation. Given a database of 3D models within the same class of objects, we build a comprehensive library of 3D local shape priors. We then formulate the problem to select as-few-as-possible priors from the library, referred to as exemplar priors. These priors are sufficient to represent the 3D shapes of the whole class of objects from where they are generated. By manipulating these priors, we are able to reconstruct geometrically faithful models with the same class of objects from raw point clouds. Our framework can be easily generalized to reconstruct various categories of 3D objects that have more geometrically or topologically complex structures. Comprehensive experiments exhibit the power of our exemplar priors for gracefully solving several problems in 3D shape reconstruction such as preserving sharp features, recovering fine details and so on.
computational-geometry  inference  statistics  algorithms  rather-interesting  approximation  modeling  nudge-targets  consider:performance-measures  consider:parsimony 
february 2017 by Vaguery
[1702.04748] An Improved Dictatorship Test with Perfect Completeness
A Boolean function f:{0,1}n→{0,1} is called a dictator if it depends on exactly one variable i.e f(x1,x2,…,xn)=xi for some i∈[n]. In this work, we study a k-query dictatorship test. Dictatorship tests are central in proving many hardness results for constraint satisfaction problems.
The dictatorship test is said to have {\em perfect completeness} if it accepts any dictator function. The {\em soundness} of a test is the maximum probability with which it accepts any function far from a dictator. Our main result is a k-query dictatorship test with perfect completeness and soundness 2k+12k, where k is of the form 2t−1 for any integer t>2. This improves upon the result of \cite{TY15} which gave a dictatorship test with soundness 2k+32k.
constraint-satisfaction  computational-complexity  rather-interesting  approximation  heuristics  nudge-targets  consider:classification 
february 2017 by Vaguery
[1701.01337] Abilities and Limitations of Spectral Graph Bisection
Spectral based heuristics belong to well-known commonly used methods for finding a minimum-size bisection in a graph. The heuristics are usually easy to implement and they work well for several practice-relevant classes of graphs. However, only a few research efforts are focused on providing rigorous analysis of such heuristics and often they lack of proven optimality or approximation quality. This paper focuses on the spectral heuristic proposed by Boppana almost three decades ago, which still belongs to one of the most important bisection methods.
It is well known that Boppana's algorithm finds and certifies an optimal bisection with high probability in the random planted bisection model -- the standard model which captures many real-world instances. In this model the vertex set is partitioned randomly into two equal sized sets, and then each edge inside the same part of the partition is chosen with probability p and each edge crossing the partition is chosen with probability q, with p≥q. In our paper we investigate the problem if Boppana's algorithm works well in the semirandom model introduced by Feige and Kilian. The model generates initially an instance by random selection within the planted bisection model, followed by adversarial decisions. Feige and Kilian posed the question if Boppana's algorithm works well in the semirandom model and it has remained open so far. In our paper we answer the question affirmatively. We show also that the algorithm achieves similar performance on graph models which generalize the semirandom model. On the other hand we prove some limitations: we show that if the density difference p−q≤o(p⋅lnn‾‾‾‾‾‾√/n‾√) then the algorithm fails with high probability in the planted bisection model. This bound is sharp, since under assumption p−q≥Ω(p⋅lnn‾‾‾‾‾‾√/n‾√) Boppana's algorithm works well in the model.
graph-theory  algorithms  bisection  horse-races  approximation  nudge-targets  consider:representation 
february 2017 by Vaguery
[1702.02891] Sparse Approximation by Semidefinite Programming
The problem of sparse approximation and the closely related compressed sensing have received tremendous attention in the past decade. Primarily studied from the viewpoint of applied harmonic analysis and signal processing, there have been two dominant algorithmic approaches to this problem: Greedy methods called the matching pursuit (MP) and the linear programming based approaches called the basis pursuit (BP). The aim of the current paper is to bring a fresh perspective to sparse approximation by treating it as a combinatorial optimization problem and providing an algorithm based on the powerful optimization technique semidefinite programming (SDP). In particular, we show that there is a randomized algorithm based on a semidefinite relaxation of the problem with performance guarantees depending on the coherence and the restricted isometry constant of the dictionary used. We then show a derandomization of the algorithm based on the method of conditional probabilities.
approximation  compressed-sensing  representation  mathematical-programming  numerical-methods  performance-measure  nudge-targets  consider:looking-to-see  consider:feature-discovery 
february 2017 by Vaguery
[1604.04660] Why Artificial Intelligence Needs a Task Theory --- And What It Might Look Like
The concept of "task" is at the core of artificial intelligence (AI): Tasks are used for training and evaluating AI systems, which are built in order to perform and automatize tasks we deem useful. In other fields of engineering theoretical foundations allow thorough evaluation of designs by methodical manipulation of well understood parameters with a known role and importance; this allows an aeronautics engineer, for instance, to systematically assess the effects of wind speed on an airplane's performance and stability. No framework exists in AI that allows this kind of methodical manipulation: Performance results on the few tasks in current use (cf. board games, question-answering) cannot be easily compared, however similar or different. The issue is even more acute with respect to artificial *general* intelligence systems, which must handle unanticipated tasks whose specifics cannot be known beforehand. A *task theory* would enable addressing tasks at the *class* level, bypassing their specifics, providing the appropriate formalization and classification of tasks, environments, and their parameters, resulting in more rigorous ways of measuring, comparing, and evaluating intelligent behavior. Even modest improvements in this direction would surpass the current ad-hoc nature of machine learning and AI evaluation. Here we discuss the main elements of the argument for a task theory and present an outline of what it might look like for physical tasks.
artificial-intelligence  philosophy-of-engineering  rather-interesting  representation  approximation  aggregation  nudge-targets  consider:looking-to-see  consider:performance-space-analysis 
february 2017 by Vaguery
Your favorite color makes learning more adaptable and precise | bioRxiv
Learning from reward feedback is essential for survival but can become extremely challenging when choice options have multiple features and feature values (curse of dimensionality). Here, we propose a general framework for learning reward values in dynamic multi-dimensional environments via encoding and updating the average value of individual features. We predicted that this feature-based learning occurs not just because it can reduce dimensionality, but more importantly because it can increase adaptability without compromising precision. We experimentally tested this novel prediction and found that in dynamic environments, human subjects adopted feature-based learning even when this approach does not reduce dimensionality. Even in static low-dimensional environment, subjects initially tended to adopt feature-based learning and switched to learning individual option values only when feature values could not accurately predict all objects values. Moreover, behaviors of two alternative network models demonstrated that hierarchical decision-making and learning could account for our experimental results and thus provides a plausible mechanism for model adoption during learning in dynamic environments. Our results constrain neural mechanisms underlying learning in dynamic multi-dimensional environments, and highlight the importance of neurons encoding the value of individual features in this learning.
machine-learning  curse-of-dimensionality  feature-construction  rather-interesting  approximation  nudge-targets  consider:feature-discovery 
february 2017 by Vaguery
[1701.01207] A Matrix Factorization Approach for Learning Semidefinite-Representable Regularizers
Regularization techniques are widely employed in optimization-based approaches for solving ill-posed inverse problems in data analysis and scientific computing. These methods are based on augmenting the objective with a penalty function, which is specified based on prior domain-specific expertise to induce a desired structure in the solution. We consider the problem of learning suitable regularization functions from data in settings in which precise domain knowledge is not directly available. Previous work under the title of `dictionary learning' or `sparse coding' may be viewed as learning a regularization function that can be computed via linear programming. We describe generalizations of these methods to learn regularizers that can be computed and optimized via semidefinite programming. Our framework for learning such semidefinite regularizers is based on obtaining structured factorizations of data matrices, and our algorithmic approach for computing these factorizations combines recent techniques for rank minimization problems along with an operator analog of Sinkhorn scaling. Under suitable conditions on the input data, our algorithm provides a locally linearly convergent method for identifying the correct regularizer that promotes the type of structure contained in the data. Our analysis is based on the stability properties of Operator Sinkhorn scaling and their relation to geometric aspects of determinantal varieties (in particular tangent spaces with respect to these varieties). The regularizers obtained using our framework can be employed effectively in semidefinite programming relaxations for solving inverse problems.
matrices  inverse-problems  approximation  algorithms  nudge-targets  consider:looking-to-see  to-write-about  consider:representation 
january 2017 by Vaguery
[1612.02483] High Dimensional Consistent Digital Segments
We consider the problem of digitalizing Euclidean line segments from ℝd to ℤd. Christ {\em et al.} (DCG, 2012) showed how to construct a set of {\em consistent digital segment} (CDS) for d=2: a collection of segments connecting any two points in ℤ2 that satisfies the natural extension of the Euclidean axioms to ℤd. In this paper we study the construction of CDSs in higher dimensions.
We show that any total order can be used to create a set of {\em consistent digital rays} CDR in ℤd (a set of rays emanating from a fixed point p that satisfies the extension of the Euclidean axioms). We fully characterize for which total orders the construction holds and study their Hausdorff distance, which in particular positively answers the question posed by Christ {\em et al.}.
approximation  computational-geometry  performance-measure  rather-interesting  mathematics  consistency  models-and-modes  constructive-geometry  nudge-targets  consider:representation  consider:looking-to-see 
january 2017 by Vaguery
[1603.08812] Reconstruction of evolved dynamic networks from degree correlations
We study the importance of local structural properties in networks which have been evolved for a power-law scaling in their Laplacian spectrum. To this end, the degree distribution, two-point degree correlations, and degree-dependent clustering are extracted from the evolved networks and used to construct random networks with the prescribed distributions. In the analysis of these reconstructed networks it turns out that the degree distribution alone is not sufficient to generate the spectral scaling and the degree-dependent clustering has only an indirect influence. The two-point correlations are found to be the dominant characteristic for the power-law scaling over a broader eigenvalue range.
approximation  graph-theory  algorithms  statistics  nudge  consider:looking-to-see  feature-extraction 
january 2017 by Vaguery
[1511.05118] Random sampling of bandlimited signals on graphs
We study the problem of sampling k-bandlimited signals on graphs. We propose two sampling strategies that consist in selecting a small subset of nodes at random. The first strategy is non-adaptive, i.e., independent of the graph structure, and its performance depends on a parameter called the graph coherence. On the contrary, the second strategy is adaptive but yields optimal results. Indeed, no more than O(k log(k)) measurements are sufficient to ensure an accurate and stable recovery of all k-bandlimited signals. This second strategy is based on a careful choice of the sampling distribution, which can be estimated quickly. Then, we propose a computationally efficient decoder to reconstruct k-bandlimited signals from their samples. We prove that it yields accurate reconstructions and that it is also stable to noise. Finally, we conduct several experiments to test these techniques.
sampling  graph-theory  network-theory  rather-interesting  approximation  compressed-sensing  nudge-targets  performance-measure  consider:looking-to-see  to-write-about 
january 2017 by Vaguery
[1509.05009] On the Expressive Power of Deep Learning: A Tensor Analysis
It has long been conjectured that hypotheses spaces suitable for data that is compositional in nature, such as text or images, may be more efficiently represented with deep hierarchical networks than with shallow ones. Despite the vast empirical evidence supporting this belief, theoretical justifications to date are limited. In particular, they do not account for the locality, sharing and pooling constructs of convolutional networks, the most successful deep learning architecture to date. In this work we derive a deep network architecture based on arithmetic circuits that inherently employs locality, sharing and pooling. An equivalence between the networks and hierarchical tensor factorizations is established. We show that a shallow network corresponds to CP (rank-1) decomposition, whereas a deep network corresponds to Hierarchical Tucker decomposition. Using tools from measure theory and matrix algebra, we prove that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require exponential size in order to be realized (or even approximated) by a shallow network. Since log-space computation transforms our networks into SimNets, the result applies directly to a deep learning architecture demonstrating promising empirical performance. The construction and theory developed in this paper shed new light on various practices and ideas employed by the deep learning community.
neural-networks  deep-learning  approximation  rather-interesting  representation  feature-construction 
january 2017 by Vaguery
[1604.01175] On the Combinatorial Complexity of Approximating Polytopes
Approximating convex bodies succinctly by convex polytopes is a fundamental problem in discrete geometry. A convex body K of diameter diam(K) is given in Euclidean d-dimensional space, where d is a constant. Given an error parameter ε>0, the objective is to determine a polytope of minimum combinatorial complexity whose Hausdorff distance from K is at most ε⋅diam(K). By combinatorial complexity we mean the total number of faces of all dimensions of the polytope. A well-known result by Dudley implies that O(1/ε(d−1)/2) facets suffice, and a dual result by Bronshteyn and Ivanov similarly bounds the number of vertices, but neither result bounds the total combinatorial complexity. We show that there exists an approximating polytope whose total combinatorial complexity is Õ (1/ε(d−1)/2), where Õ  conceals a polylogarithmic factor in 1/ε. This is a significant improvement upon the best known bound, which is roughly O(1/εd−2).
Our result is based on a novel combination of both old and new ideas. First, we employ Macbeath regions, a classical structure from the theory of convexity. The construction of our approximating polytope employs a new stratified placement of these regions. Second, in order to analyze the combinatorial complexity of the approximating polytope, we present a tight analysis of a width-based variant of B\'{a}r\'{a}ny and Larman's economical cap covering. Finally, we use a deterministic adaptation of the witness-collector technique (developed recently by Devillers et al.) in the context of our stratified construction.
approximation  computational-geometry  computational-complexity  algorithms  rather-interesting  to-understand  representation  nudge-targets  consider:look 
january 2017 by Vaguery
[1610.08820] Bin Packing Problem: A Linear Constant-Space 3/2-Approximation Algorithm
Since the Bin Packing Problem (BPP) is one of the main NP-hard problems, a lot of approximation algorithms have been suggested for it. It has been proven that the best algorithm for BPP has the approximation ratio of 3/2 and the time order of O(n), unless P=NP. In the current paper, a linear 3/2-approximation algorithm is presented. The suggested algorithm not only has the best possible theoretical factors, approximation ratio, space order, and time order, but also outperforms the other approximation algorithms according to the experimental results, therefore, we are able to draw the conclusion that the algorithms is the best approximation algorithm which has been presented for the problem until now. Key words: Approximation Algorithm, Bin Packing Problem, Approximation Ratio, NP-hard.
bin-packing  operations-research  optimization  approximation  horse-races  rather-interesting  nudge-targets  consider:looking-to-see  consider:representation  consider:robustness 
december 2016 by Vaguery
[1612.00193] Learning Potential Energy Landscapes using Graph Kernels
Recent machine learning methods make it possible to model potential energy of atomic configurations with chemical-level accuracy (as calculated from ab-initio calculations) and at speeds suitable for molecular dynamics simulation. Best performance is achieved when the known physical constraints are encoded in the machine learning models. For example, the atomic energy is invariant under global translations and rotations; it is also invariant to permutations of same-species atoms. Although simple to state, these symmetries are complicated to encode into machine learning algorithms. In this paper, we present a machine learning approach based on graph theory that naturally incorporates translation, rotation, and permutation symmetries. Specifically, we use a random walk graph kernel to measure the similarity of two adjacency matrices, each of which represents a local atomic environment. We show on a standard benchmark that our Graph Approximated Energy (GRAPE) method is competitive with state of the art kernel methods. Furthermore, the GRAPE framework is flexible and admits many possible extensions.
energy-landscapes  fitness-landscapes  machine-learning  approximation  kernel-methods  graph-kernels  to-understand  representation  nudge-targets  consider:representation 
december 2016 by Vaguery
[1602.02262] Recovery guarantee of weighted low-rank approximation via alternating minimization
Many applications require recovering a ground truth low-rank matrix from noisy observations of the entries, which in practice is typically formulated as a weighted low-rank approximation problem and solved by non-convex optimization heuristics such as alternating minimization. In this paper, we provide provable recovery guarantee of weighted low-rank via a simple alternating minimization algorithm. In particular, for a natural class of matrices and weights and without any assumption on the noise, we bound the spectral norm of the difference between the recovered matrix and the ground truth, by the spectral norm of the weighted noise plus an additive error that decreases exponentially with the number of rounds of alternating minimization, from either initialization by SVD or, more importantly, random initialization. These provide the first theoretical results for weighted low-rank via alternating minimization with non-binary deterministic weights, significantly generalizing those for matrix completion, the special case with binary weights, since our assumptions are similar or weaker than those made in existing works. Furthermore, this is achieved by a very simple algorithm that improves the vanilla alternating minimization with a simple clipping step.
The key technical challenge is that under non-binary deterministic weights, na\"ive alternating steps will destroy the incoherence and spectral properties of the intermediate solutions, which are needed for making progress towards the ground truth. We show that the properties only need to hold in an average sense and can be achieved by the clipping step.
We further provide an alternating algorithm that uses a whitening step that keeps the properties via SDP and Rademacher rounding and thus requires weaker assumptions. This technique can potentially be applied in some other applications and is of independent interest.
matrices  approximation  dimension-reduction  compressed-sensing  algorithms  nudge-targets  consider:looking-to-see  consider:representation 
december 2016 by Vaguery
[1612.01605] Zipf's law, unbounded complexity and open-ended evolution
A major problem for evolutionary theory is understanding the so called {\em open-ended} nature of evolutionary change. Open-ended evolution (OEE) refers to the unbounded increase in complexity that seems to characterise evolution on multiple scales. This property seems to be a characteristic feature of biological and technological evolution and is strongly tied to the generative potential associated with combinatorics, which allows the system to grow and expand their available state spaces. Several theoretical and computational approaches have been developed to properly characterise OEE. Interestingly, many complex systems displaying OEE, from language to proteins, share a common statistical property: the presence of Zipf's law. Given and inventory of basic items required to build more complex structures Zipf's law tells us that most of these elements are rare whereas a few of them are extremely common. Using Algorithmic Information Theory, in this paper we provide a fundamental definition for open-endedness, which can be understood as {\em postulates}. Its statistical counterpart, based on standard Shannon Information theory, has the structure of a variational problem which is shown to lead to Zipf's law as the expected consequence of an evolutionary processes displaying OEE. We further explore the problem of information conservation through an OEE process and we conclude that statistical information (standard Shannon information) is not conserved, resulting into the paradoxical situation in which the increase of information content has the effect of erasing itself. We prove that this paradox is solved if we consider non-statistical forms of information. This last result implies that standard information theory may not be a suitable theoretical framework to explore the persistence and increase of the information content in OEE systems.
information-theory  open-ended-evolution  artificial-life  emergent-design  emergence  approximation  modeling  to-write-about 
december 2016 by Vaguery
[1606.09630] Universality of Resilience Patterns in Generalized Lotka Volterra Dynamics and Beyond
Recently, a theoretical framework aimed at separating the roles of dynamics and topology in multi-dimensional systems has been developed (Gao et al, \textit{Nature}, Vol 530:307 (2016)). It proposes an effective description of the emergent universal resilience patterns in complex networks for a very general class of pair-wise interaction dynamics. The procedure consists in collapsing the multi-dimensional system state variables (e.g. stationary populations) into a 1-dimensional effective equation that is analytically tractable. The validity of the collapse is assumed to hold depending on two main hypothesis: (i) The network determined by the the interaction between pairs of nodes has negligible degree correlations; (ii) The node activities are uniform across nodes on both the drift and pair-wise interaction functions. Moreover, the authors consider only positive (mutualistic) interactions. Here we show the conditions proposed by Gao and collaborators are neither sufficient nor necessary to guarantee that their method works in general, and validity of their results are not independent of the model chosen within the class of dynamics they considered. Indeed we find that a new condition poses effective limitations to their framework and we provide quantitative predictions of the quality of the one dimensional collapse as a function of the properties of the interaction networks and dynamics. At the same time, we show that multi-dimensional reduction may work also beyond the assumption of strictly mutualistic interactions, thus extending the validity of their framework. We prove our results for generalized Lotka-Volterra dynamics through both analytical and numerical evidences.
nonlinear-dynamics  approximation  rather-interesting  dimension-reduction  algorithms  horse-races  performance-measure  to-write-about  nudge-targets  consider:looking-to-see 
december 2016 by Vaguery
A Novel Algorithm for Coarse-Graining of Cellular Automata - Springer
The coarse-graining is an approximation procedure widely used for simplification of mathematical and numerical models of multiscale systems. It reduces superfluous – microscopic – degrees of freedom. Israeli and Goldenfeld demonstrated in [1,2] that the coarse-graining can be employed for elementary cellular automata (CA), producing interesting interdependences between them. However, extending their investigation on more complex CA rules appeared to be impossible due to the high computational complexity of the coarse-graining algorithm. We demonstrate here that this complexity can be substantially decreased. It allows for scrutinizing much broader class of cellular automata in terms of their coarse graining. By using our algorithm we found out that the ratio of the numbers of elementary CAs having coarse grained representation to “degenerate” – irreducible – cellular automata, strongly increases with increasing the “grain” size of the approximation procedure. This rises principal questions about the formal limits in modeling of realistic multiscale systems.
cellular-automata  aggregation  approximation  to-do  to-write-about  via:dan-little 
december 2016 by Vaguery
[1106.3779] Subsum Sets: Intervals, Cantor Sets, and Cantorvals
Given a sequence converging to zero, we consider the set of numbers which are sums of (infinite, finite, or empty) subsequences. When the original sequence is not absolutely summable, the subsum set is an unbounded closed interval which includes zero. When it is absolutely summable the subsum set is one of the following: a finite union of (nontrivial) compact intervals, a Cantor set, or a "symmetric Cantorval".
mathematics  rather-interesting  approximation  representation  to-write-about 
december 2016 by Vaguery
[1605.03135] Adjoint-based Gradient Estimation Using the Space-time Solutions of Unknown Conservation Law Simulations
Many control applications can be formulated as optimization constrained by conservation laws. Such optimization can be efficiently solved by gradient-based methods, where the gradient is obtained through the adjoint method. Traditionally, the adjoint method has not been able to be implemented in "gray-box" conservation law simulations. In gray-box simulations, the analytical and numerical form of the conservation law is unknown, but the space-time solution of relevant flow quantities is available. Without the adjoint gradient, optimization can be challenging for problems with many control variables. However, much information about the gray-box simulation is contained in its space-time solution, which motivates us to estimate the adjoint gradient by leveraging the space-time solution. This article considers a type of gray-box simulations where the flux function is partially unknown. A method is introduced to estimate the adjoint gradient at a cost independent of the number of control variables. The method firstly infers a conservation law, named the twin model, from the space-time solution, and then applies the adjoint method to the inferred twin model to estimate the gradient. The method is demonstrated to achieve good gradient estimation accuracies in several numerical examples. The main contributions of this paper are: a twin model method that enables the adjoint gradient computation for gray-box conservation law simulations; and an adaptive basis construction scheme that fully exploits the information of gray-box solutions.
optimization  algorithms  approximation  constraint-satisfaction  rather-interesting  to-understand  representation 
october 2016 by Vaguery
[1605.06444] Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes
In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost-function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare - but extremely dense and accessible - regions of configurations in the network weight space. We define a novel measure, which we call the "robust ensemble" (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models, and also provide a general algorithmic scheme which is straightforward to implement: define a cost-function given by a sum of a finite number of replicas of the original cost-function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful new algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.
neural-networks  update  machine-learning  approximation  rather-interesting  philosophy-of-engineering  to-write-about 
october 2016 by Vaguery
[1609.05034] What You Will Gain By Rounding: Theory and Algorithms for Rounding Rank
When analyzing discrete data such as binary matrices using matrix factorizations, we often have to make a choice between using expensive combinatorial methods that retain the discrete nature of the data and using continuous methods that can be more efficient but destroy the discrete structure. Alternatively, we can first compute a continuous factorization and subsequently apply a rounding procedure to obtain a discrete representation. But what will we gain by rounding? Will this yield lower reconstruction errors? Is it easy to find a low-rank matrix that rounds to a given binary matrix? Does it matter which threshold we use for rounding? Does it matter if we allow for only non-negative factorizations? In this paper, we approach these and further questions by presenting and studying the concept of rounding rank. We show that rounding rank is related to linear classification, dimensionality reduction, and nested matrices. We also report on an extensive experimental study that compares different algorithms for finding good factorizations under the rounding rank model.
approximation  matrices  numerical-methods  algorithms  rather-interesting  nudge-targets  consider:looking-to-see 
october 2016 by Vaguery
[1605.00562] Persistent homology of time-dependent functional networks constructed from coupled time series
We use topological data analysis to study "functional networks" that we construct from time-series data from both experimental and synthetic sources. We use persistent homology in combination with a weight rank clique filtration to gain insights into these functional networks, and we use persistence landscapes to interpret our results. Our first example consists of biological data in the form of functional magnetic resonance imaging (fMRI) data that was acquired from human subjects during a simple motor-learning task. Our second example uses time-series output from networks of coupled Kuramoto oscillators. With these examples, we demonstrate that (1) using persistent homology to study functional networks provides fascinating insights into their properties and (2) the position of the features in a filtration can play a more vital role than persistence in the interpretation of topological features, even though conventionally the latter is used to distinguish between signal and noise. We find that persistent homology can detect differences in synchronization patterns in our data sets over time, giving insight both on changes in community structure in the networks and on increased synchronization between brain regions that form loops in a functional network during motor learning. For the motor-learning data, persistence landscapes also reveal that on average the majority of changes in the network loops takes place on the second of three days of the learning process.
nonlinear-dynamics  time-series  approximation  clustering  rather-interesting  feature-extraction  nudge-targets  consider:performance-measures  consider:representation  reminds-me-of-Wim's-paper 
september 2016 by Vaguery
[1608.08225] Why does deep and cheap learning work so well?
We show how the success of deep learning depends not only on mathematics but also on physics: although well-known mathematical theorems guarantee that neural networks can approximate arbitrary functions well, the class of functions of practical interest can be approximated through "cheap learning" with exponentially fewer parameters than generic ones, because they have simplifying properties tracing back to the laws of physics. The exceptional simplicity of physics-based functions hinges on properties such as symmetry, locality, compositionality and polynomial log-probability, and we explore how these properties translate into exceptionally simple neural networks approximating both natural phenomena such as images and abstract representations thereof such as drawings. We further argue that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine-learning, a deep neural network can be more efficient than a shallow one. We formalize these claims using information theory and discuss the relation to renormalization group procedures. Various "no-flattening theorems" show when these efficient deep networks cannot be accurately approximated by shallow ones without efficiency loss - even for linear networks.
machine-learning  approximation  information-theory  rather-interesting  to-write-about  via:many 
august 2016 by Vaguery
[1606.01241] A Better Method for Volume Determination of Regularly and Irregularly Shaped Nanoparticles with Enhanced Accuracy
Nanoparticles (NPs) are widely used in diverse application areas, such as medicine, engineering, and cosmetics. The size (or volume) of NPs is one of the most important parameters for their successful application. It is relatively straightforward to determine the volume of regular NPs such as spheres and cubes from a one-dimensional or two-dimensional measurement. However, due to the three-dimensional nature of NPs, it is challenging to determine the proper physical size of many types of regularly and irregularly-shaped NPs (IS-NPs) at high-throughput using a single tool. Here, we present a relatively simple method that statistically determines a better volume estimate of many types of NPs by combining measurements from their top-down projection areas and peak-heights using two tools. The proposed method is significantly faster and more economical than the electron tomography method. We demonstrate the improved accuracy of the combined method over scanning electron microscopy (SEM) and atomic force microscopy (AFM) by using both modeling and measurements. This study also shows that SEM provides a more accurate estimate of size than AFM for most IS-NP size measurements. The method provides a much needed, proper high-throughput volumetric measurement method useful for many applications.
measurement  approximation  rather-interesting  metrology  experiment  nanotechnology  engineering  nudge-targets  consider:looking-to-see  algorithms 
july 2016 by Vaguery
[1605.06396] Soft Covering with High Probability
Wyner's soft-covering lemma is the central analysis step for achievability proofs of information theoretic security, resolvability, and channel synthesis. It can also be used for simple achievability proofs in lossy source coding. This work sharpens the claim of soft-covering by moving away from an expected value analysis. Instead, a random codebook is shown to achieve the soft-covering phenomenon with high probability. The probability of failure is super-exponentially small in the block-length, enabling many applications through the union bound. This work gives bounds for both the exponential decay rate of total variation and the second-order codebook rate for soft covering.
information-theory  looking-to-see  rather-interesting  heuristics  approximation  nudge-targets  consider:looking-to-see  consider:feature-discovery 
july 2016 by Vaguery
[1607.05712] Structure-Blind Signal Recovery
We consider the problem of recovering a signal observed in Gaussian noise. If the set of signals is convex and compact, and can be specified beforehand, one can use classical linear estimators that achieve a risk within a constant factor of the minimax risk. However, when the set is unspecified, designing an estimator that is blind to the hidden structure of the signal remains a challenging problem. We propose a new family of estimators to recover signals observed in Gaussian noise. Instead of specifying the set where the signal lives, we assume the existence of a well-performing linear estimator. Proposed estimators enjoy exact oracle inequalities and can be efficiently computed through convex optimization. We present several numerical illustrations that show the potential of the approach.
signal-processing  image-processing  algorithms  inference  rather-interesting  approximation  nudge-targets  consider:looking-to-see 
july 2016 by Vaguery
[1411.1420] Basis Learning as an Algorithmic Primitive
A number of important problems in theoretical computer science and machine learning can be interpreted as recovering a certain basis. These include symmetric matrix eigendecomposition, certain tensor decompositions, Independent Component Analysis (ICA), spectral clustering and Gaussian mixture learning. Each of these problems reduces to an instance of our general model, which we call a "Basis Encoding Function" (BEF). We show that learning a basis within this model can then be provably and efficiently achieved using a first order iteration algorithm (gradient iteration). Our algorithm goes beyond tensor methods while generalizing a number of existing algorithms---e.g., the power method for symmetric matrices, the tensor power iteration for orthogonal decomposable tensors, and cumulant-based FastICA---all within a broader function-based dynamical systems framework. Our framework also unifies the unusual phenomenon observed in these domains that they can be solved using efficient non-convex optimization. Specifically, we describe a class of BEFs such that their local maxima on the unit sphere are in one-to-one correspondence with the basis elements. This description relies on a certain "hidden convexity" property of these functions.
We provide a complete theoretical analysis of the gradient iteration even when the BEF is perturbed. We show convergence and complexity bounds polynomial in dimension and other relevant parameters, such as perturbation size. Our perturbation results can be considered as a non-linear version of the classical Davis-Kahan theorem for perturbations of eigenvectors of symmetric matrices. In addition we show that our algorithm exhibits fast (superlinear) convergence and relate the speed of convergence to the properties of the BEF. Moreover, the gradient iteration algorithm can be easily and efficiently implemented in practice.
machine-learning  approximation  gradient-descent  representation  algorithms  nudge-targets  consider:representation  reasonable-approach 
july 2016 by Vaguery
[1606.09333] Dimension-Free Iteration Complexity of Finite Sum Optimization Problems
Many canonical machine learning problems boil down to a convex optimization problem with a finite sum structure. However, whereas much progress has been made in developing faster algorithms for this setting, the inherent limitations of these problems are not satisfactorily addressed by existing lower bounds. Indeed, current bounds focus on first-order optimization algorithms, and only apply in the often unrealistic regime where the number of iterations is less than (d/n) (where d is the dimension and n is the number of samples). In this work, we extend the framework of (Arjevani et al., 2015) to provide new lower bounds, which are dimension-free, and go beyond the assumptions of current bounds, thereby covering standard finite sum optimization methods, e.g., SAG, SAGA, SVRG, SDCA without duality, as well as stochastic coordinate-descent methods, such as SDCA and accelerated proximal SDCA.
machine-learning  approximation  rather-interesting  optimization  gradient-descent  algorithms  computational-complexity  nudge-targets  consider:looking-to-see 
july 2016 by Vaguery
[1606.00901] Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery
The approximate message passing (AMP) algorithm shows advantage over conventional convex optimization methods in recovering under-sampled sparse signals. AMP is analytically tractable and has a much lower complexity. However, it requires that the true parameters of the input and output channels are known. In this paper, we propose an AMP algorithm with built-in parameter estimation that jointly estimates the sparse signals along with the parameters by treating them as unknown random variables with simple priors. Specifically, the maximum a posterior (MAP) parameter estimation is presented and shown to produce estimations that converge to the true parameter values. Experiments on sparse signal recovery show that the performance of the proposed approach matches that of the oracle AMP algorithm where the true parameter values are known.
compressed-sensing  information-theory  signal-processing  algorithms  inference  nudge-targets  approximation  rather-interesting  consider:feature-discovery  consider:robustness 
july 2016 by Vaguery
Tradict enables high fidelity reconstruction of the eukaryotic transcriptome from 100 marker genes | bioRxiv
Transcript levels are a critical determinant of the proteome and hence cellular function. Because the transcriptome is an outcome of the interactions between genes and their products, we reasoned it might be accurately represented by a subset of transcript abundances. We develop a method, Tradict (transcriptome predict), capable of learning and using the expression measurements of a small subset of 100 marker genes to reconstruct entire transcriptomes. By analyzing over 23,000 publicly available RNA-Seq datasets, we show that Tradict is robust to noise and accurate, especially for predicting the expression of a comprehensive, but interpretable list of transcriptional programs that represent the major biological processes and cellular pathways. Coupled with targeted RNA sequencing, Tradict may therefore enable simultaneous transcriptome-wide screening and mechanistic investigation at large scales. Thus, whether for performing forward genetic, chemogenomic, or agricultural screens or for profiling single-cells, Tradict promises to help accelerate genetic dissection and drug discovery.
bioinformatics  algorithms  statistics  models  rather-interesting  systems-biology  nudge-targets  approximation  consider:feature-discovery 
july 2016 by Vaguery
[1509.07892] Evasion and Hardening of Tree Ensemble Classifiers
Classifier evasion consists in finding for a given instance x the nearest instance x′ such that the classifier predictions of x and x′ are different. We present two novel algorithms for systematically computing evasions for tree ensembles such as boosted trees and random forests. Our first algorithm uses a Mixed Integer Linear Program solver and finds the optimal evading instance under an expressive set of constraints. Our second algorithm trades off optimality for speed by using symbolic prediction, a novel algorithm for fast finite differences on tree ensembles. On a digit recognition task, we demonstrate that both gradient boosted trees and random forests are extremely susceptible to evasions. Finally, we harden a boosted tree model without loss of predictive accuracy by augmenting the training set of each boosting round with evading instances, a technique we call adversarial boosting.
classification  machine-learning  approximation  robustness  rather-interesting  nudge-targets  feature-construction 
july 2016 by Vaguery
[1209.4403] Four Soviets Walk the Dog - with an Application to Alt's Conjecture
Given two polygonal curves in the plane, there are several ways to define a measure of similarity between them. One measure that has been extremely popular in the past is the Frechet distance. Since it has been proposed by Alt and Godau in 1992, many variants and extensions have been described. However, even 20 years later, the original O(n^2 log n) algorithm by Alt and Godau for computing the Frechet distance remains the state of the art (here n denotes the number of vertices on each curve). This has led Helmut Alt to conjecture that the associated decision problem is 3SUM-hard.
In recent work, Agarwal et al. show how to break the quadratic barrier for the discrete version of the Frechet distance, where we consider sequences of points instead of polygonal curves. Building on their work, we give an algorithm to compute the Frechet distance between two polygonal curves in time O(n^2 (log n)^(1/2) (\log\log n)^(3/2)) on a pointer machine and in time O(n^2 (loglog n)^2) on a word RAM. Furthermore, we show that there exists an algebraic decision tree for the Frechet problem of depth O(n^(2-epsilon)), for some epsilon > 0. This provides evidence that computing the Frechet distance may not be 3SUM-hard after all and reveals an intriguing new aspect of this well-studied problem.
computational-geometry  algorithms  nudge-targets  approximation  computational-complexity  consider:representation 
june 2016 by Vaguery
[1606.06660] Mapping polygons to the grid with small Hausdorff and Fr'echet distance
We show how to represent a simple polygon P by a grid (pixel-based) polygon Q that is simple and whose Hausdorff or Fr\'echet distance to P is small. For any simple polygon P, a grid polygon exists with constant Hausdorff distance between their boundaries and their interiors. Moreover, we show that with a realistic input assumption we can also realize constant Fr\'echet distance between the boundaries. We present algorithms accompanying these constructions, heuristics to improve their output while keeping the distance bounds, and experiments to assess the output.
computational-geometry  algorithms  approximation  rather-interesting  nudge-targets  consider:rediscovery  consider:feature-discovery 
june 2016 by Vaguery
[1606.06488] Discretized Approaches to Schematization
To produce cartographic maps, simplification is typically used to reduce complexity of the map to a legible level. With schematic maps, however, this simplification is pushed far beyond the legibility threshold and is instead constrained by functional need and resemblance. Moreover, stylistic geometry is often used to convey the schematic nature of the map. In this paper we explore discretized approaches to computing a schematic shape S for a simple polygon P. We do so by overlaying a plane graph G on P as the solution space for the schematic shape. Topological constraints imply that S should describe a simple polygon. We investigate two approaches, simple map matching and connected face selection, based on commonly used similarity metrics.
With the former, S is a simple cycle C in G and we quantify resemblance via the Fr\'echet distance. We prove that it is NP-hard to compute a cycle that approximates the minimal Fr\'echet distance over all simple cycles in a plane graph G. This result holds even if G is a partial grid graph, if area preservation is required and if we assume a given sequence of turns is specified.
With the latter, S is a connected face set in G, quantifying resemblance via the symmetric difference. Though the symmetric difference seems a less strict measure, we prove that it is NP-hard to compute the optimal face set. This result holds even if G is full grid graph or a triangular or hexagonal tiling, and if area preservation is required. Moreover, it is independent of whether we allow the set of faces to have holes or not.
computational-geometry  approximation  visualization  graph-theory  rather-interesting  algorithms  summarization  nudge-targets  consider:feature-discovery 
june 2016 by Vaguery
[1605.01867] Statistical mechanics analysis of thresholding 1-bit compressed sensing
The one-bit compressed sensing framework aims to reconstruct a sparse signal by only using the sign information of its linear measurements. To compensate for the loss of scale information, past studies in the area have proposed recovering the signal by imposing an additional constraint on the L2-norm of the signal. Recently, an alternative strategy that captures scale information by introducing a threshold parameter to the quantization process was advanced. In this paper, we analyze the typical behavior of the thresholding 1-bit compressed sensing utilizing the replica method of statistical mechanics, so as to gain an insight for properly setting the threshold value. Our result shows that, fixing the threshold at a constant value yields better performance than varying it randomly when the constant is optimally tuned, statistically. Unfortunately, the optimal threshold value depends on the statistical properties of the target signal, which may not be known in advance. In order to handle this inconvenience, we develop a heuristic that adaptively tunes the threshold parameter based on the frequency of positive (or negative) values in the binary outputs. Numerical experiments show that the heuristic exhibits satisfactory performance while incurring low computational cost.
compressed-sensing  information-theory  algorithms  approximation  nudge-targets  consider:looking-to-see 
may 2016 by Vaguery
[1605.03949] Efficient Similarity Search in Dynamic Data Streams
The Jaccard index is an important similarity measure for item sets and Boolean data. On large datasets, an exact similarity computation is often infeasible for all item pairs both due to time and space constraints, giving rise to faster approximate methods. The algorithm of choice used to quickly compute the Jaccard index |A∩B||A∪B| of two item sets A and B is usually a form of min-hashing. Most min-hashing schemes are maintainable in data streams processing only additions, but none are known to work when facing item-wise deletions. In this paper, we investigate scalable approximation algorithms for rational set similarities, a broad class of similarity measures including Jaccard.
Motivated by a result of Chierichetti and Kumar [J. ACM 2015] who showed any rational set similarity S admits a locality sensitive hashing (LSH) scheme if and only if the corresponding distance 1−S is a metric, we can show that there exists a space efficient summary maintaining a (1±ε) multiplicative approximation to 1−S in dynamic data streams. This in turn also yields a ε additive approximation of the similarity.
The existence of these approximations hints at, but does not directly imply a LSH scheme in dynamic data streams. Our second and main contribution now lies in the design of such a LSH scheme maintainable in dynamic data streams. The scheme is space efficient, easy to implement and to the best of our knowledge the first of its kind able to process deletions.
algorithms  metrics  distance-measure  rather-interesting  approximation  nudge-targets  consider:rediscovery 
may 2016 by Vaguery
« earlier      
per page:    204080120160

related tags

(consider:whether-system-rediscovers-approximations)  3d  abstraction  accuracy  against-fitness-landscapes  agent-based  aggregation  algebra  algebraic-topology  algorithms  analysis  analytical-expressions  analytical-models  analytical-to-a-fault  anomaly-detection  ants  approximation  archetypal-analysis  artificial-intelligence  artificial-life  assistant-bots  automata  back-to-basics  Barnsleyism  basis-functions  benchmarking  Bezier-curves  bin-packing  biochemistry  bioinformatics  biophysics  bisection  bounds  calculus  can't-wait-to-understand-this  cellular-automata  chaos  chemical-reaction-networks  chemistry  cladistics  classification  clustering  codes  coevolution-it-ain't  collective-intelligence  combinatorics  communication  community-detection  compare-to-Pareto-GP-features  compass-and-straightedge  complex-systems  complexology  compressed-sensing  compression  computational-complexity  computational-geometry  computer-graphics  computer-science  confidence-intervals  consequences-of-early-decisions  consider:approximation  consider:as-primitives-in-LP-project  consider:automaton-as-type  consider:auxiliary-features  consider:can-precision-errors-be-selected-for?  consider:classification  consider:complexity  consider:compression  consider:convergence  consider:corrections  consider:diversity-collection  consider:duh  consider:evolving-some  consider:feature-discovery  consider:fine-tuning  consider:for-behavior-classification  consider:implementation  consider:incremental-representations  consider:information-theory  consider:learning-algorithms  consider:learning-what's-meant-not-shown  consider:Lexicase-selection  consider:look  consider:looking-to-see  consider:Nk-models  consider:OCR-project  consider:other-decomposition-schemes  consider:overfitting-as-a-tool  consider:Pareto-GP-does-this  consider:parsimony  consider:performance-measures  consider:performance-metrics  consider:performance-space-analysis  consider:re-approximation  consider:re-evolving  consider:rederivation  consider:rediscovery  consider:relation-to-data-balancing  consider:relation-to-lexicase  consider:replication  consider:representation  consider:robustness  consider:selection  consider:stress-testing  consider:symbolic-manipulation  consider:this-approach-elsewhere  consider:transitional-forms  consider:tunable-forms  consider:using  consistency  constraint-satisfaction  construction  constructive-geometry  continued-fractions  control-systems  control-theory  convergence  convex-hull  convex-hulls  coordination  covariance  curse-of-dimensionality  curve-fitting  Dartmouth-project  data-analysis  data-balancing  data-cleaning  data-fusion  data-integration  data-mining  data-representation  data-structures  decomposition-problems  deep-learning  design-patterns  differential-equations  diffy-Qs  diffyQ  diffyQs  dimension-reduction  dimensional-reduction  discrete-mathematics  discretization  distance-measure  dynamic-programming  dynamical-systems  edit-distance  emergence  emergent-design  energy-landscapes  engineering  engineering-design  error  estimation  excited-media  experiment  feature-construction  feature-extraction  feature-invention  fitness-landscapes  fractals  frustrated-systems  functional-MRI  fuzzy-math  game-theory  gene-regulatory-networks  generative-art  generative-models  geometry  good-question  gradient-descent  graph-design  graph-kernels  graph-layout  graph-theory  graphics  group-theory  heterogeneity  heuristics  hey-I-know-this-guy  high-dimensional  histograms  history-of-science  horse-races  hypergraphs  IFS  image-analysis  image-processing  image-segmentation  inference  information-theory  integer-programming  interesting  interpolation  introspection  inverse-problems  Kauffmania  kernel-methods  lab  learning-algorithms  lexicase-selection  like-the-Soft-Push-scheme  limits  linear-algebra  linear-models  linear-programming  looking-to-see  low-hanging-fruit  machine-learning  machine-precision  Markov-models  Markov-random-fields  markup  materials-science  mathematical-programming  mathematical-recreations  mathematics  matrices  matrix-factorization  matroids  measurement  medical-technology  meh  meta-optimization  metamodeling  metrics  metrology  Misrepresentations  mnemonics  model-fusion  modeling  modeling-is-not-mathematics  models  models-and-modes  molecular-biology  molecular-design  multifractals  multiobjective-optimization  multiscale-phenomena  multiscale-representations?  nanotechnology  narrowing-bounds  Nash-equilibria  natural-language-processing  nearest-neighbors  network-theory  networks  neural-networks  neuroscience  noise  nonlinear-dynamics  nudge  nudge-targets  number-theory  numerical-methods  odd-framework  ontology  open-ended-evolution  open-questions  operations-research  optimization  out-of-the-box  outlier-detection  outliers  packing  packing-problems  Pareto-front  pattern-discovery  PCA  PCS  percolation  performance-measure  permutations  philosophy-of-engineering  philosophy-of-science  phylogenetics  physics  physics!  plane-geometry  planning  power-laws  prediction  preprocessing  principal-components-analysis  probability-theory  proof  protein-folding  provability  purty-pitchers  quite-nice  R-language  random-fields  ranking  rather-interesting  rather-odd  re-representation  reaction-networks  reasonable-approach  regression  relaxation  reminds-me-of-Wim's-paper  representation  RNA  robustness  routing  salience-detection  sampling  sampling-bias  satisfiability  self-organization  sensors  series  set-theory  signal-processing  simplex-method  simplification  simulation  smh  smoothing  social-networks  sparse-encoding  sparse-matrices  sparse-methods  specificity  stability-analyses  statistics  stochastic-resonance  stochastic-systems  stochastic-vs-differential  stress-testing  stress-testing-in-the-wild  strings  structural-biology  subsampling  summarization  summary  surrogate-models  SVMs  symbolic-methods  systems-biology  tensors  testing  the-mangle-in-practice  theoretical-biology  theory-and-practice-sitting-in-a-tree  tiling  time-series  to-do  to-understand  to-write-about  tomography  topology  training  travelling-salesman  trees  TSP  type-theory  unsupervised-learning  update  verification  via:??  via:arthegall  via:cshalizi  via:dan-little  via:many  via:my  via:vielmetti  video  visualization  Voronoi-diagrams  wavelets  workarounds  wtf? 

Copy this bookmark:



description:


tags: