**feature-construction**188

[1801.01922] Vectorization of Line Drawings via PolyVector Fields

3 days ago by Vaguery

Image tracing is a foundational component of the workflow in graphic design, engineering, and computer animation, linking hand-drawn concept images to collections of smooth curves needed for geometry processing and editing. Even for clean line drawings, modern algorithms often fail to faithfully vectorize junctions, or points at which curves meet; this produces vector drawings with incorrect connectivity. This subtle issue undermines the practical application of vectorization tools and accounts for hesitance among artists and engineers to use automatic vectorization software. To address this issue, we propose a novel image vectorization method based on state-of-the-art mathematical algorithms for frame field processing. Our algorithm is tailored specifically to disambiguate junctions without sacrificing quality.

graphics
algorithms
inference
rather-interesting
feature-construction
nudge-targets
consider:representation
consider:looking-to-see
consider:performance-measures
3 days ago by Vaguery

[1606.05381] Deep Image Set Hashing

10 days ago by Vaguery

In applications involving matching of image sets, the information from multiple images must be effectively exploited to represent each set. State-of-the-art methods use probabilistic distribution or subspace to model a set and use specific distance measure to compare two sets. These methods are slow to compute and not compact to use in a large scale scenario. Learning-based hashing is often used in large scale image retrieval as they provide a compact representation of each sample and the Hamming distance can be used to efficiently compare two samples. However, most hashing methods encode each image separately and discard knowledge that multiple images in the same set represent the same object or person. We investigate the set hashing problem by combining both set representation and hashing in a single deep neural network. An image set is first passed to a CNN module to extract image features, then these features are aggregated using two types of set feature to capture both set specific and database-wide distribution information. The computed set feature is then fed into a multilayer perceptron to learn a compact binary embedding. Triplet loss is used to train the network by forming set similarity relations using class labels. We extensively evaluate our approach on datasets used for image matching and show highly competitive performance compared to state-of-the-art methods.

algorithms
machine-learning
rather-interesting
feature-construction
nudge-targets
consider:looking-at-code
consider:performance-measures
10 days ago by Vaguery

[1709.08119] Analogies between the crossing number and the tangle crossing number

6 weeks ago by Vaguery

Tanglegrams are special graphs that consist of a pair of rooted binary trees with the same number of leaves, and a perfect matching between the two leaf-sets. These objects are of use in phylogenetics and are represented with straightline drawings where the leaves of the two plane binary trees are on two parallel lines and only the matching edges can cross. The tangle crossing number of a tanglegram is the minimum crossing number over all such drawings and is related to biologically relevant quantities, such as the number of times a parasite switched hosts.

Our main results for tanglegrams which parallel known theorems for crossing numbers are as follows. The removal of a single matching edge in a tanglegram with n leaves decreases the tangle crossing number by at most n−3, and this is sharp. Additionally, if γ(n) is the maximum tangle crossing number of a tanglegram with n leaves, we prove 12(n2)(1−o(1))≤γ(n)<12(n2). Further, we provide an algorithm for computing non-trivial lower bounds on the tangle crossing number in O(n4) time. This lower bound may be tight, even for tanglegrams with tangle crossing number Θ(n2).

graph-theory
feature-construction
phylogenetics
metrics
distance-measures
algorithms
rather-interesting
computational-complexity
nudge-targets
Our main results for tanglegrams which parallel known theorems for crossing numbers are as follows. The removal of a single matching edge in a tanglegram with n leaves decreases the tangle crossing number by at most n−3, and this is sharp. Additionally, if γ(n) is the maximum tangle crossing number of a tanglegram with n leaves, we prove 12(n2)(1−o(1))≤γ(n)<12(n2). Further, we provide an algorithm for computing non-trivial lower bounds on the tangle crossing number in O(n4) time. This lower bound may be tight, even for tanglegrams with tangle crossing number Θ(n2).

6 weeks ago by Vaguery

[1710.02034] Weight = nonlinearity for all small weight Boolean functions

7 weeks ago by Vaguery

Given a Boolean function f, the (Hamming) weight wt(f) and the nonlinearity N(f) are well known to be important in designing functions that are useful in cryptography. The nonlinearity is expensive to compute, in general, so any shortcuts for doing that for particular functions f are significant. It is clear from the definitions that wt(f) \leq N(f). We give a useful and practical sufficient condition for the weight and nonlinearity to be equal, namely wt(f) \leq (2 to the power n-2) if f has n variables. It is shown that this inequality cannot be improved, in general. This condition can also be used to actually compute the nonlinearity in some cases. As an application, we give a simple proof for the value of the nonlinearity of the well known majority function. Previous proofs relied on many detailed results for the Krawtchouk polynomials.

information-theory
Boolean-functions
representation
distance
cryptography
feature-construction
rather-interesting
to-write-about
consider:genetic-programming
7 weeks ago by Vaguery

[1705.00441] Learning Topic-Sensitive Word Representations

7 weeks ago by Vaguery

Distributed word representations are widely used for modeling words in NLP tasks. Most of the existing models generate one representation per word and do not consider different meanings of a word. We present two approaches to learn multiple topic-sensitive representations per word by using Hierarchical Dirichlet Process. We observe that by modeling topics and integrating topic distributions for each document we obtain representations that are able to distinguish between different meanings of a given word. Our models yield statistically significant improvements for the lexical substitution task indicating that commonly used single word representations, even when combined with contextual information, are insufficient for this task.

natural-language-processing
feature-construction
neural-networks
algorithms
nudge-targets
consider:looking-to-see
consider:performance-measures
7 weeks ago by Vaguery

[1710.03568] 321-avoiding affine permutations and their many heaps

9 weeks ago by Vaguery

We study 321-avoiding affine permutations, and prove a formula for their enumeration with respect to the inversion number by using a combinatorial approach. This is done in two different ways, both related to Viennot's theory of heaps. First, we encode these permutations using certain heaps of monomers and dimers. This method specializes to the case of affine involutions. For the second proof, we introduce periodic parallelogram polyominoes, which are new combinatorial objects of independent interest. We enumerate them by extending the approach of Bousquet-M\'elou and Viennot used for classical parallelogram polyominoes. We finally establish a connection between these new objects and 321-avoiding affine permutations.

permutations
combinatorics
to-understand
representation
feature-construction
9 weeks ago by Vaguery

[1703.07856] Testing for the Equality of two Distributions on High Dimensional Object Spaces

november 2017 by Vaguery

Energy statistics are estimators of the energy distance that depend on the distances between observations. The idea behind energy statistics is to consider a statistical potential energy that would parallel Newton's gravitational potential energy. This statistical potential energy is zero if and only if a certain null hypothesis relating two distributions holds true. In Szekely and Rizzo(2004), a nonparametric test for equality of two multivariate distributions was given, based on the Euclidean distance between observations. This test was shown to be effective for high dimensional multivariate data, and was implemented by an appropriate distribution free permutation test. As an extension of Szekely and Rizzo (2013), here we consider the energy distance between to independent random objects X and Y on the object space M, that admits an embedding into an Euclidean space. In the case of a Kendall shape space, we can use its VW-embedding into an Euclidean space of matrices and define the extrinsic distance between two shapes as their VW associated distance. The corresponding energy distance between two distributions of Kendall shapes of k-ads will be called VW-energy distance We test our methodology on, to compare the distributions of Kendall shape of the contour of the midsagittal section of the Corpus Callossum in normal vs ADHD diagnosed individuals. Here we use the VW distance between the shapes of two children CC midsections. Using the CC data coming originally from this http URL 1000.projects.nitrc.org/indi/adhd200/ it appears that the two Kendall shape distributions are not significantly different.

classification
feature-construction
rather-interesting
computer-vision
representation
algorithms
nudge-targets
consider:rediscovery
consider:looking-to-see
november 2017 by Vaguery

[1710.06993] Improved Search in Hamming Space using Deep Multi-Index Hashing

november 2017 by Vaguery

Similarity-preserving hashing is a widely-used method for nearest neighbour search in large-scale image retrieval tasks. There has been considerable research on generating efficient image representation via the deep-network-based hashing methods. However, the issue of efficient searching in the deep representation space remains largely unsolved. To this end, we propose a simple yet efficient deep-network-based multi-index hashing method for simultaneously learning the powerful image representation and the efficient searching. To achieve these two goals, we introduce the multi-index hashing (MIH) mechanism into the proposed deep architecture, which divides the binary codes into multiple substrings. Due to the non-uniformly distributed codes will result in inefficiency searching, we add the two balanced constraints at feature-level and instance-level, respectively. Extensive evaluations on several benchmark image retrieval datasets show that the learned balanced binary codes bring dramatic speedups and achieve comparable performance over the existing baselines.

metrics
image-processing
deep-learning
performance-space-analysis
feature-construction
machine-learning
rather-interesting
similarity-measures
to-write-about
november 2017 by Vaguery

[1709.09130] Output Range Analysis for Deep Neural Networks

november 2017 by Vaguery

Deep neural networks (NN) are extensively used for machine learning tasks such as image classification, perception and control of autonomous systems. Increasingly, these deep NNs are also been deployed in high-assurance applications. Thus, there is a pressing need for developing techniques to verify neural networks to check whether certain user-expected properties are satisfied. In this paper, we study a specific verification problem of computing a guaranteed range for the output of a deep neural network given a set of inputs represented as a convex polyhedron. Range estimation is a key primitive for verifying deep NNs. We present an efficient range estimation algorithm that uses a combination of local search and linear programming problems to efficiently find the maximum and minimum values taken by the outputs of the NN over the given input set. In contrast to recently proposed "monolithic" optimization approaches, we use local gradient descent to repeatedly find and eliminate local minima of the function. The final global optimum is certified using a mixed integer programming instance. We implement our approach and compare it with Reluplex, a recently proposed solver for deep neural networks. We demonstrate the effectiveness of the proposed approach for verification of NNs used in automated control as well as those used in classification.

neural-networks
benchmarking
rather-interesting
feature-construction
performance-measure
modeling-is-not-mathematics
algorithms
nudge-targets
consider:looking-to-see
consider:generalizing
november 2017 by Vaguery

[1504.00212] New results on the stopping time behaviour of the Collatz 3x + 1 function

november 2017 by Vaguery

Let σn=⌊1+n⋅log23⌋. For the Collatz 3x + 1 function exists for each n∈ℕ a set of different residue classes (mod 2σn) of starting numbers s with finite stopping time σ(s)=σn. Let zn be the number of these residue classes for each n≥0 as listed in the OEIS as A100982. It is conjectured that for each n≥4 the value of zn is given by the formula

zn=(⌊5(n−2)3⌋n−2)−∑i=2n−1(⌊3(n−i)+δ2⌋n−i)⋅zi,

where δ∈ℤ assumes different values within the sum at intervals of 5 or 6 terms. This allows to create an iterative algorithm which generates zn for each n>12. This has been proved for each n≤10000. The number z10000 has 4527 digits.

number-theory
mathematical-recreations
rather-interesting
to-write-about
nudge-targets
consider:looking-to-see
feature-construction
zn=(⌊5(n−2)3⌋n−2)−∑i=2n−1(⌊3(n−i)+δ2⌋n−i)⋅zi,

where δ∈ℤ assumes different values within the sum at intervals of 5 or 6 terms. This allows to create an iterative algorithm which generates zn for each n>12. This has been proved for each n≤10000. The number z10000 has 4527 digits.

november 2017 by Vaguery

[1701.07681] Fast and Accurate Time Series Classification with WEASEL

november 2017 by Vaguery

Time series (TS) occur in many scientific and commercial applications, ranging from earth surveillance to industry automation to the smart grids. An important type of TS analysis is classification, which can, for instance, improve energy load forecasting in smart grids by detecting the types of electronic devices based on their energy consumption profiles recorded by automatic sensors. Such sensor-driven applications are very often characterized by (a) very long TS and (b) very large TS datasets needing classification. However, current methods to time series classification (TSC) cannot cope with such data volumes at acceptable accuracy; they are either scalable but offer only inferior classification quality, or they achieve state-of-the-art classification quality but cannot scale to large data volumes.

In this paper, we present WEASEL (Word ExtrAction for time SEries cLassification), a novel TSC method which is both scalable and accurate. Like other state-of-the-art TSC methods, WEASEL transforms time series into feature vectors, using a sliding-window approach, which are then analyzed through a machine learning classifier. The novelty of WEASEL lies in its specific method for deriving features, resulting in a much smaller yet much more discriminative feature set. On the popular UCR benchmark of 85 TS datasets, WEASEL is more accurate than the best current non-ensemble algorithms at orders-of-magnitude lower classification and training times, and it is almost as accurate as ensemble classifiers, whose computational complexity makes them inapplicable even for mid-size datasets. The outstanding robustness of WEASEL is also confirmed by experiments on two real smart grid datasets, where it out-of-the-box achieves almost the same accuracy as highly tuned, domain-specific methods.

time-series
classification
feature-construction
rather-interesting
modeling
to-write-about
to-do
consider:robustness
In this paper, we present WEASEL (Word ExtrAction for time SEries cLassification), a novel TSC method which is both scalable and accurate. Like other state-of-the-art TSC methods, WEASEL transforms time series into feature vectors, using a sliding-window approach, which are then analyzed through a machine learning classifier. The novelty of WEASEL lies in its specific method for deriving features, resulting in a much smaller yet much more discriminative feature set. On the popular UCR benchmark of 85 TS datasets, WEASEL is more accurate than the best current non-ensemble algorithms at orders-of-magnitude lower classification and training times, and it is almost as accurate as ensemble classifiers, whose computational complexity makes them inapplicable even for mid-size datasets. The outstanding robustness of WEASEL is also confirmed by experiments on two real smart grid datasets, where it out-of-the-box achieves almost the same accuracy as highly tuned, domain-specific methods.

november 2017 by Vaguery

[1611.07087] Connectivity in Hypergraphs

october 2017 by Vaguery

In this paper we consider two natural notions of connectivity for hypergraphs: weak and strong. We prove that the strong vertex connectivity of a connected hypergraph is bounded by its weak edge connectivity, thereby extending a theorem of Whitney from graphs to hypergraphs. We find that while determining a minimum weak vertex cut can be done in polynomial time and is equivalent to finding a minimum vertex cut in the 2-section of the hypergraph in question, determining a minimum strong vertex cut is NP-hard for general hypergraphs. Moreover, the problem of finding minimum strong vertex cuts remains NP-hard when restricted to hypergraphs with maximum edge size at most 3. We also discuss the relationship between strong vertex connectivity and the minimum transversal problem for hypergraphs, showing that there are classes of hypergraphs for which one of the problems is NP-hard while the other can be solved in polynomial time.

hypergraphs
combinatorics
define-your-terms
feature-construction
algorithms
computational-complexity
to-write-about
october 2017 by Vaguery

[1710.00228] Physics-based Motion Planning: Evaluation Criteria and Benchmarking

october 2017 by Vaguery

Motion planning has evolved from coping with simply geometric problems to physics-based ones that incorporate the kinodynamic and the physical constraints imposed by the robot and the physical world. Therefore, the criteria for evaluating physics-based motion planners goes beyond the computational complexity (e.g. in terms of planning time) usually used as a measure for evaluating geometrical planners, in order to consider also the quality of the solution in terms of dynamical parameters. This study proposes an evaluation criteria and analyzes the performance of several kinodynamic planners, which are at the core of physics-based motion planning, using different scenarios with fixed and manipulatable objects. RRT, EST, KPIECE and SyCLoP are used for the benchmarking. The results show that KPIECE computes the time-optimal solution with heighest success rate, whereas, SyCLoP compute the most power-optimal solution among the planners used.

simulation
planning
algorithms
rather-interesting
feature-construction
approximation
nudge-targets
consider:representation
october 2017 by Vaguery

[1610.00331] Comparing 1D and 2D Real Time on Cellular Automata

october 2017 by Vaguery

We study the influence of the dimension of cellular automata (CA) for real time language recognition of one-dimensional languages with parallel input. Specifically, we focus on the question of determining whether every language that can be recognized in real time on a 2-dimensional CA working on the Moore neighborhood can also be recognized in real time by a 1-dimensional CA working on the standard two-way neighborhood.

We show that 2-dimensional CA in real time can perform a linear number of simulations of a 1-dimensional real time CA. If the two classes are equal then the number of simulated instances can be polynomial.

cellular-automata
rather-interesting
alternative-computational-models
distributed-processing
computer-science
universality
representation
feature-construction
engineering-design
to-write-about
We show that 2-dimensional CA in real time can perform a linear number of simulations of a 1-dimensional real time CA. If the two classes are equal then the number of simulated instances can be polynomial.

october 2017 by Vaguery

[math/0103224] On the Minimum Ropelength of Knots and Links

october 2017 by Vaguery

The ropelength of a knot is the quotient of its length and its thickness, the radius of the largest embedded normal tube around the knot. We prove existence and regularity for ropelength minimizers in any knot or link type; these are C1,1 curves, but need not be smoother. We improve the lower bound for the ropelength of a nontrivial knot, and establish new ropelength bounds for small knots and links, including some which are sharp.

knot-theory
optimization
rather-interesting
feature-construction
geometry
engineering-design
nudge-targets
consider:approximation
october 2017 by Vaguery

[math/0204106] The Second Hull of a Knotted Curve

october 2017 by Vaguery

The convex hull of a set K in space consists of points which are, in a certain sense, "surrounded" by K. When K is a closed curve, we define its higher hulls, consisting of points which are "multiply surrounded" by the curve. Our main theorem shows that if a curve is knotted then it has a nonempty second hull. This provides a new proof of the Fary/Milnor theorem that every knotted curve has total curvature at least 4pi.

knot-theory
geometry
topology
rather-interesting
feature-construction
proof
nudge-targets
consider:representation
consider:looking-to-see
october 2017 by Vaguery

[1509.07588] Fractional coverings, greedy coverings, and rectifier networks

october 2017 by Vaguery

A rectifier network is a directed acyclic graph with distinguished sources and sinks; it is said to compute a Boolean matrix M that has a 1 in the entry (i,j) iff there is a path from the jth source to the ith sink. The smallest number of edges in a rectifier network computing M is a classic complexity measure on matrices, which has been studied for more than half a century.

We explore two well-known techniques that have hitherto found little to no applications in this theory. Both of them build on a basic fact that depth-2 rectifier networks are essentially weighted coverings of Boolean matrices with rectangles. We obtain new results by using fractional and greedy coverings (defined in the standard way).

First, we show that all fractional coverings of the so-called full triangular matrix have cost at least nlogn. This provides (a fortiori) a new proof of the tight lower bound on its depth-2 complexity (the exact value has been known since 1965, but previous proofs are based on different arguments). Second, we show that the greedy heuristic is instrumental in tightening the upper bound on the depth-2 complexity of the Kneser-Sierpi\'nski (disjointness) matrix. The previous upper bound is O(n1.28), and we improve it to O(n1.17), while the best known lower bound is Ω(n1.16). Third, using fractional coverings, we obtain a form of direct product theorem that gives a lower bound on unbounded-depth complexity of Kronecker (tensor) products of matrices. In this case, the greedy heuristic shows (by an argument due to Lov\'asz) that our result is only a logarithmic factor away from the "full" direct product theorem. Our second and third results constitute progress on open problem 7.3 and resolve, up to a logarithmic factor, open problem 7.5 from a recent book by Jukna and Sergeev (in Foundations and Trends in Theoretical Computer Science (2013)).

network-theory
feature-construction
rather-interesting
graph-theory
representation
nudge-targets
consider:rediscovery
We explore two well-known techniques that have hitherto found little to no applications in this theory. Both of them build on a basic fact that depth-2 rectifier networks are essentially weighted coverings of Boolean matrices with rectangles. We obtain new results by using fractional and greedy coverings (defined in the standard way).

First, we show that all fractional coverings of the so-called full triangular matrix have cost at least nlogn. This provides (a fortiori) a new proof of the tight lower bound on its depth-2 complexity (the exact value has been known since 1965, but previous proofs are based on different arguments). Second, we show that the greedy heuristic is instrumental in tightening the upper bound on the depth-2 complexity of the Kneser-Sierpi\'nski (disjointness) matrix. The previous upper bound is O(n1.28), and we improve it to O(n1.17), while the best known lower bound is Ω(n1.16). Third, using fractional coverings, we obtain a form of direct product theorem that gives a lower bound on unbounded-depth complexity of Kronecker (tensor) products of matrices. In this case, the greedy heuristic shows (by an argument due to Lov\'asz) that our result is only a logarithmic factor away from the "full" direct product theorem. Our second and third results constitute progress on open problem 7.3 and resolve, up to a logarithmic factor, open problem 7.5 from a recent book by Jukna and Sergeev (in Foundations and Trends in Theoretical Computer Science (2013)).

october 2017 by Vaguery

[1707.03894] The Generalized Nagell-Ljunggren Problem: Powers with Repetitive Representations

october 2017 by Vaguery

We consider a natural generalization of the Nagell-Ljunggren equation to the case where the qth power of an integer y, for q >= 2, has a base-b representation that consists of a length-l block of digits repeated n times, where n >= 2. Assuming the abc conjecture of Masser and Oesterl\'e, we completely characterize those triples (q, n, l) for which there are infinitely many solutions b. In all cases predicted by the abc conjecture, we are able (without any assumptions) to prove there are indeed infinitely many solutions.

number-theory
feature-construction
rather-interesting
mathematical-recreations
nudge-targets
consider:looking-to-see
consider:classification
october 2017 by Vaguery

[1407.5841] Mechanical Proofs of Properties of the Tribonacci Word

october 2017 by Vaguery

We implement a decision procedure for answering questions about a class of infinite words that might be called (for lack of a better name) "Tribonacci-automatic". This class includes, for example, the famous Tribonacci word T = 0102010010202 ..., the fixed point of the morphism 0 -> 01, 1 -> 02, 2 -> 0. We use it to reprove some old results about the Tribonacci word from the literature, such as assertions about the occurrences in T of squares, cubes, palindromes, and so forth. We also obtain some new results.

formal-languages
strings
automata
representation
feature-construction
rather-interesting
nudge-targets
consider:looking-to-see
consider:rediscovery
october 2017 by Vaguery

[1311.2318] Counting the Palstars

october 2017 by Vaguery

A palstar (after Knuth, Morris, and Pratt) is a concatenation of even-length palindromes. We show that, asymptotically, there are Θ(αnk) palstars of length 2n over a k-letter alphabet, where αk is a constant such that 2k−1<αk<2k−12. In particular, α2≐3.33513193.

strings
feature-construction
classification
nudge-targets
consider:looking-to-see
october 2017 by Vaguery

**related tags**

Copy this bookmark: