**consider:representation**386

[1806.07366] Neural Ordinary Differential Equations

12 days ago by Vaguery

We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a blackbox differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.

neural-networks
machine-learning
representation
rather-interesting
deep-learning
to-understand
consider:representation
to-write-about
12 days ago by Vaguery

[1602.06208v2] Every positive integer is a sum of three palindromes

12 weeks ago by Vaguery

For integer g≥5, we prove that any positive integer can be written as a sum of three palindromes in base g.

mathematical-recreations
number-theory
nudge-targets
consider:rediscovery
consider:representation
to-write-about
benchmarks
reveal-intent
12 weeks ago by Vaguery

[1802.03676] Differentiable Dynamic Programming for Structured Prediction and Attention

may 2018 by Vaguery

Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.

representation
time-series
numerical-methods
deep-learning
rather-interesting
nudge-targets
consider:representation
may 2018 by Vaguery

[1709.04109] Empower Sequence Labeling with Task-Aware Neural Language Model

march 2018 by Vaguery

Linguistic sequence labeling is a general modeling approach that encompasses a variety of problems, such as part-of-speech tagging and named entity recognition. Recent advances in neural networks (NNs) make it possible to build reliable models without handcrafted features. However, in many cases, it is hard to obtain sufficient annotations to train these models. In this study, we develop a novel neural framework to extract abundant knowledge hidden in raw texts to empower the sequence labeling task. Besides word-level knowledge contained in pre-trained word embeddings, character-aware neural language models are incorporated to extract character-level knowledge. Transfer learning techniques are further adopted to mediate different components and guide the language model towards the key knowledge. Comparing to previous methods, these task-specific knowledge allows us to adopt a more concise model and conduct more efficient training. Different from most transfer learning methods, the proposed framework does not rely on any additional supervision. It extracts knowledge from self-contained order information of training sequences. Extensive experiments on benchmark datasets demonstrate the effectiveness of leveraging character-level knowledge and the efficiency of co-training. For example, on the CoNLL03 NER task, model training completes in about 6 hours on a single GPU, reaching F1 score of 91.71±0.10 without using any extra annotation.

natural-language-processing
deep-learning
neural-networks
nudge-targets
consider:feature-discovery
consider:representation
to-write-about
march 2018 by Vaguery

A hybrid placement strategy for the three-dimensional strip packing problem - ScienceDirect

march 2018 by Vaguery

This paper presents a hybrid placement strategy for the three-dimensional strip packing problem which involves packing a set of cuboids (‘boxes’) into a three-dimensional bin (parallelepiped) of fixed width and height but unconstrained length (the ‘container’). The goal is to pack all of the boxes into the container, minimising its resulting length. This problem has potential industry application in stock cutting (wood, polystyrene, etc. – minimising wastage) and also cargo loading, as well as other applications in areas such as multi-dimensional resource scheduling. In addition to the proposed strategy a number of test results on available literature benchmark problems are presented and analysed. The results of empirical testing of the algorithm show that it out-performs other methods from the literature, consistently in terms of speed and solution quality-producing 28 best known results from 35 test cases.

operations-research
hyperheuristics
metaheuristics
optimization
constraint-satisfaction
nudge-targets
consider:representation
consider:looking-to-see
to-write-about
march 2018 by Vaguery

[1709.08359] On the expressive power of query languages for matrices

march 2018 by Vaguery

We investigate the expressive power of 𝖬𝖠𝖳𝖫𝖠𝖭𝖦, a formal language for matrix manipulation based on common matrix operations and linear algebra. The language can be extended with the operation 𝗂𝗇𝗏 of inverting a matrix. In 𝖬𝖠𝖳𝖫𝖠𝖭𝖦+𝗂𝗇𝗏 we can compute the transitive closure of directed graphs, whereas we show that this is not possible without inversion. Indeed we show that the basic language can be simulated in the relational algebra with arithmetic operations, grouping, and summation. We also consider an operation 𝖾𝗂𝗀𝖾𝗇 for diagonalizing a matrix, which is defined so that different eigenvectors returned for a same eigenvalue are orthogonal. We show that 𝗂𝗇𝗏 can be expressed in 𝖬𝖠𝖳𝖫𝖠𝖭𝖦+𝖾𝗂𝗀𝖾𝗇. We put forward the open question whether there are boolean queries about matrices, or generic queries about graphs, expressible in 𝖬𝖠𝖳𝖫𝖠𝖭𝖦+𝖾𝗂𝗀𝖾𝗇 but not in 𝖬𝖠𝖳𝖫𝖠𝖭𝖦+𝗂𝗇𝗏. The evaluation problem for 𝖬𝖠𝖳𝖫𝖠𝖭𝖦+𝖾𝗂𝗀𝖾𝗇 is shown to be complete for the complexity class ∃R.

matrices
programming-language
representation
rather-interesting
nudge
consider:representation
to--do
march 2018 by Vaguery

[1801.01922] Vectorization of Line Drawings via PolyVector Fields

march 2018 by Vaguery

Image tracing is a foundational component of the workflow in graphic design, engineering, and computer animation, linking hand-drawn concept images to collections of smooth curves needed for geometry processing and editing. Even for clean line drawings, modern algorithms often fail to faithfully vectorize junctions, or points at which curves meet; this produces vector drawings with incorrect connectivity. This subtle issue undermines the practical application of vectorization tools and accounts for hesitance among artists and engineers to use automatic vectorization software. To address this issue, we propose a novel image vectorization method based on state-of-the-art mathematical algorithms for frame field processing. Our algorithm is tailored specifically to disambiguate junctions without sacrificing quality.

graphics
algorithms
inference
rather-interesting
feature-construction
nudge-targets
consider:representation
consider:looking-to-see
consider:performance-measures
march 2018 by Vaguery

[1802.06415] Solving Large-Scale Minimum-Weight Triangulation Instances to Provable Optimality

march 2018 by Vaguery

We consider practical methods for the problem of finding a minimum-weight triangulation (MWT) of a planar point set, a classic problem of computational geometry with many applications. While Mulzer and Rote proved in 2006 that computing an MWT is NP-hard, Beirouti and Snoeyink showed in 1998 that computing provably optimal solutions for MWT instances of up to 80,000 uniformly distributed points is possible, making use of clever heuristics that are based on geometric insights. We show that these techniques can be refined and extended to instances of much bigger size and different type, based on an array of modifications and parallelizations in combination with more efficient geometric encodings and data structures. As a result, we are able to solve MWT instances with up to 30,000,000 uniformly distributed points in less than 4 minutes to provable optimality. Moreover, we can compute optimal solutions for a vast array of other benchmark instances that are not uniformly distributed, including normally distributed instances (up to 30,000,000 points), all point sets in the TSPLIB (up to 85,900 points), and VLSI instances with up to 744,710 points. This demonstrates that from a practical point of view, MWT instances can be handled quite well, despite their theoretical difficulty.

computational-geometry
algorithms
plane-geometry
optimization
to-write-about
nudge-targets
consider:representation
march 2018 by Vaguery

[1709.08004] Slow-scale split-step tau-leap method for stiff stochastic chemical systems

february 2018 by Vaguery

Tau-leaping is a family of algorithms for the approximate simulation of discrete state continuous time Markov chains. The motivation for the development of such methods can be found, for instance, in the fields of chemical kinetics and systems biology. It is well known that the dynamical behavior of biochemical systems is often intrinsically stiff representing a serious challenge for their numerical approximation. The naive extension of stiff deterministic solvers to stochastic integration usually yields numerical solutions with either impractically large relaxation times or incorrectly resolved covariance. In this paper, we propose a novel splitting heuristic which allows to resolve these issues. The proposed numerical integrator takes advantage of the special structure of the linear systems with explicitly available equations for the mean and the covariance which we use to calibrate the parameters of the scheme. It is shown that the method is able to reproduce the exact mean and variance of the linear scalar test equation and has very good accuracy for the arbitrarily stiff systems at least in linear case. The numerical examples for both linear and nonlinear systems are also provided and the obtained results confirm the efficiency of the considered splitting approach.

numerical-methods
diffy-Qs
approximation
systems-biology
algorithms
to-write-about
consider:representation
rather-interesting
nudge
february 2018 by Vaguery

Discovering Types for Entity Disambiguation

february 2018 by Vaguery

Our system uses the following steps:

Extract every Wikipedia-internal link to determine, for each word, the set of conceivable entities it can refer to. For example, when encountering the link [jaguar](https://en.wikipedia.org/wiki/Jaguar) in a Wikipedia page, we conclude that https://en.wikipedia.org/wiki/Jaguar is one of the meanings of jaguar.

Walk the Wikipedia category tree (using the Wikidata knowledge graph) to determine, for each entity, the set of categories it belongs to. For example, at the bottom of https://en.wikipedia.org/wiki/Jaguar_Cars’s Wikipedia page, are the following categories (which themselves have their own categories, such as Automobiles):

Pick a list of ~100 categories to be your “type” system, and optimize over this choice of categories so that they compactly express any entity. We know the mapping of entities to categories, so given a type system, we can represent each entity as a ~100-dimensional binary vector indicating membership in each category.

Using every Wikipedia-internal link and its surrounding context, produce training data mapping a word plus context to the ~100-dimensional binary representation of the corresponding entity, and train a neural network to predict this mapping. This chains together the previous steps: Wikipedia links map a word to an entity, we know the categories for each entity from step 2, and step 3 picked the categories in our type system.

At test time, given a word and surrounding context, our neural network’s output can be interpreted as the probability that the word belongs to each category. If we knew the exact set of category memberships, we would narrow down to one entity (assuming well-chosen categories). But instead, we must play a probabilistic 20 questions: use Bayes’ theorem to calculate the chance of the word disambiguating to each of its possible entities.

machine-learning
disambiguation
data-fusion
to-write-about
consider:representation
Extract every Wikipedia-internal link to determine, for each word, the set of conceivable entities it can refer to. For example, when encountering the link [jaguar](https://en.wikipedia.org/wiki/Jaguar) in a Wikipedia page, we conclude that https://en.wikipedia.org/wiki/Jaguar is one of the meanings of jaguar.

Walk the Wikipedia category tree (using the Wikidata knowledge graph) to determine, for each entity, the set of categories it belongs to. For example, at the bottom of https://en.wikipedia.org/wiki/Jaguar_Cars’s Wikipedia page, are the following categories (which themselves have their own categories, such as Automobiles):

Pick a list of ~100 categories to be your “type” system, and optimize over this choice of categories so that they compactly express any entity. We know the mapping of entities to categories, so given a type system, we can represent each entity as a ~100-dimensional binary vector indicating membership in each category.

Using every Wikipedia-internal link and its surrounding context, produce training data mapping a word plus context to the ~100-dimensional binary representation of the corresponding entity, and train a neural network to predict this mapping. This chains together the previous steps: Wikipedia links map a word to an entity, we know the categories for each entity from step 2, and step 3 picked the categories in our type system.

At test time, given a word and surrounding context, our neural network’s output can be interpreted as the probability that the word belongs to each category. If we knew the exact set of category memberships, we would narrow down to one entity (assuming well-chosen categories). But instead, we must play a probabilistic 20 questions: use Bayes’ theorem to calculate the chance of the word disambiguating to each of its possible entities.

february 2018 by Vaguery

[1802.01021] DeepType: Multilingual Entity Linking by Neural Type System Evolution

february 2018 by Vaguery

The wealth of structured (e.g. Wikidata) and unstructured data about the world available today presents an incredible opportunity for tomorrow's Artificial Intelligence. So far, integration of these two different modalities is a difficult process, involving many decisions concerning how best to represent the information so that it will be captured or useful, and hand-labeling large amounts of data. DeepType overcomes this challenge by explicitly integrating symbolic information into the reasoning process of a neural network with a type system. First we construct a type system, and second, we use it to constrain the outputs of a neural network to respect the symbolic structure. We achieve this by reformulating the design problem into a mixed integer problem: create a type system and subsequently train a neural network with it. In this reformulation discrete variables select which parent-child relations from an ontology are types within the type system, while continuous variables control a classifier fit to the type system. The original problem cannot be solved exactly, so we propose a 2-step algorithm: 1) heuristic search or stochastic optimization over discrete variables that define a type system informed by an Oracle and a Learnability heuristic, 2) gradient descent to fit classifier parameters. We apply DeepType to the problem of Entity Linking on three standard datasets (i.e. WikiDisamb30, CoNLL (YAGO), TAC KBP 2010) and find that it outperforms all existing solutions by a wide margin, including approaches that rely on a human-designed type system or recent deep learning-based entity embeddings, while explicitly using symbolic information lets it integrate new entities without retraining.

data-fusion
machine-learning
deep-learning
rather-interesting
inference
classification
to-write-about
consider:representation
february 2018 by Vaguery

[1607.01985] A Common Derivation for Markov Chain Monte Carlo Algorithms with Tractable and Intractable Targets

february 2018 by Vaguery

Markov chain Monte Carlo is a class of algorithms for drawing Markovian samples from high-dimensional target densities to approximate the numerical integration associated with computing statistical expectation, especially in Bayesian statistics. However, many Markov chain Monte Carlo algorithms do not seem to share the same theoretical support and each algorithm is proven in a different way. This incurs many terminologies and ancillary concepts, which makes Markov chain Monte Carlo literature seems to be scattered and intimidating to researchers from many other fields, including new researchers of Bayesian statistics.

A generalised version of the Metropolis-Hastings algorithm is constructed with a random number generator and a self-reverse mapping. This formulation admits many other Markov chain Monte Carlo algorithms as special cases. A common derivation for many Markov chain Monte Carlo algorithms is useful in drawing connections and comparisons between these algorithms. As a result, we now can construct many novel combinations of multiple Markov chain Monte Carlo algorithms that amplify the efficiency of each individual algorithm. Specifically, we propose two novel sampling schemes that combine slice sampling with directional or Hamiltonian sampling. Our Hamiltonian slice sampling scheme is also applicable in the pseudo-marginal context where the target density is intractable but can be unbiasedly estimated, e.g. using particle filtering.

sampling
algorithms
representation
rather-interesting
nudge-targets
consider:representation
define-your-terms
A generalised version of the Metropolis-Hastings algorithm is constructed with a random number generator and a self-reverse mapping. This formulation admits many other Markov chain Monte Carlo algorithms as special cases. A common derivation for many Markov chain Monte Carlo algorithms is useful in drawing connections and comparisons between these algorithms. As a result, we now can construct many novel combinations of multiple Markov chain Monte Carlo algorithms that amplify the efficiency of each individual algorithm. Specifically, we propose two novel sampling schemes that combine slice sampling with directional or Hamiltonian sampling. Our Hamiltonian slice sampling scheme is also applicable in the pseudo-marginal context where the target density is intractable but can be unbiasedly estimated, e.g. using particle filtering.

february 2018 by Vaguery

Battleship

february 2018 by Vaguery

I’m going to continue my analysis of classic card and board games by looking at the game of Battleship. (See early postings for analysis of Chutes & Ladders, Candyland and Risk).

dammit-arthegall
games
game-theory
probability-theory
simulation
nudge-targets
to-write-about
consider:representation
february 2018 by Vaguery

[1802.04196] Universal quantum computing and three-manifolds

february 2018 by Vaguery

A single qubit may be represented on the Bloch sphere or similarly on the 3-sphere S3. Our goal is to dress this correspondence by converting the language of universal quantum computing (uqc) to that of 3-manifolds. A magic state and the Pauli group acting on it define a model of uqc as a POVM that one recognizes to be a 3-manifold M3. E. g., the d-dimensional POVMs defined from subgroups of finite index of the modular group PSL(2,ℤ) correspond to d-fold M3- coverings over the trefoil knot. In this paper, one also investigates quantum information on a few \lq universal' knots and links such as the figure-of-eight knot, the Whitehead link and Borromean rings, making use of the catalog of platonic manifolds available on SnapPy. Further connections between POVMs based uqc and M3's obtained from Dehn fillings are explored.

quantums
quantum-computing
representation
rather-interesting
to-write-about
nudge
consider:representation
february 2018 by Vaguery

[1709.04851] Factor Analysis of Interval Data

february 2018 by Vaguery

This paper presents a factor analysis model for symbolic data, focusing on the particular case of interval-valued variables. The proposed method describes the correlation structure among the measured interval-valued variables in terms of a few underlying, but unobservable, uncorrelated interval-valued variables, called \textit{common factors}. Uniform and Triangular distributions are considered within each observed interval. We obtain the corresponding sample mean, variance and covariance assuming a general Triangular distribution.

In our proposal, factors are extracted either by Principal Component or by Principal Axis Factoring, performed on the interval-valued variables correlation matrix. To estimate the values of the common factors, usually called \textit{factor scores}, two approaches are considered, which are inspired in methods for real-valued data: the Bartlett and the Anderson-Rubin methods. In both cases, the estimated values are obtained solving an optimization problem that minimizes a function of the weighted squared Mallows distance between quantile functions. Explicit expressions for the quantile function and the squared Mallows distance are derived assuming a general Triangular distribution.

The applicability of the method is illustrated using two sets of data: temperature and precipitation in cities of the United States of America between the years 1971 and 2000 and measures of car characteristics of different makes and models. Moreover, the method is evaluated on synthetic data with predefined correlation structures.

representation
statistics
algorithms
rather-interesting
to-write-about
nudge
consider:representation
In our proposal, factors are extracted either by Principal Component or by Principal Axis Factoring, performed on the interval-valued variables correlation matrix. To estimate the values of the common factors, usually called \textit{factor scores}, two approaches are considered, which are inspired in methods for real-valued data: the Bartlett and the Anderson-Rubin methods. In both cases, the estimated values are obtained solving an optimization problem that minimizes a function of the weighted squared Mallows distance between quantile functions. Explicit expressions for the quantile function and the squared Mallows distance are derived assuming a general Triangular distribution.

The applicability of the method is illustrated using two sets of data: temperature and precipitation in cities of the United States of America between the years 1971 and 2000 and measures of car characteristics of different makes and models. Moreover, the method is evaluated on synthetic data with predefined correlation structures.

february 2018 by Vaguery

Creating Intricate Art with Neural Style Transfer – Becoming Human: Artificial Intelligence Magazine

february 2018 by Vaguery

The architecture is based on Gatys’ style transfer algorithm with a few minor modifications. In this case, the content image is a silhouette and style image can be any pattern (ranging from simple black and white doodle to more complex color mosaics). The code also contains a module to invert and create a mask based on the content image, which will eventually be applied to the generated pattern.

generative-art
generative-models
style-transfer
computational-recreations
to-write-about
to-do
nudge-targets
consider:representation
february 2018 by Vaguery

Maverick Solitaire

february 2018 by Vaguery

An early form of Poker solitaire is actually a puzzle of sorts, one which has been called Maverick solitaire (after its appearance on the 1950's/1960's Western T.V. show Maverick, in the first season episode Rope of Cards). Twenty five cards are dealt from a shuffled 52-card deck. The object is to divide the 25 cards into five groups of five cards so that each is a pat hand in draw poker (a hand which does not need to be drawn to). Maverick Solitaire is well-known as a sucker bet, as the probability of success with a random group of 25 cards would seem low, but is actually quite high: the eponymous author of Maverick's Guide to Poker estimates the odds to be at least 98 percent (he disallows four of a kind). This is remarkably accurate: Mark Masten's computer solver, allowing four of a kind, solved about 98.1 percent of a random set of 1000 deals. Deals with unique solutions are even less common than impossible ones: the sample above had 19 impossible deals and only 8 with unique solutions.

mathematical-recreations
cards
nudge-targets
consider:representation
consider:looking-to-see
february 2018 by Vaguery

[1707.05994] Computing Tutte Paths

january 2018 by Vaguery

Tutte paths are one of the most successful tools for attacking Hamiltonicity problems in planar graphs. Unfortunately, results based on them are non-constructive, as their proofs inherently use an induction on overlapping subgraphs and these overlaps hinder to bound the running time to a polynomial. For special cases however, computational results of Tutte paths are known: For 4-connected planar graphs, Tutte paths are in fact Hamiltonian paths and Chiba and Nishizeki showed how to compute such paths in linear time. For 3-connected planar graphs, Tutte paths have a more complicated structure, and it has only recently been shown that they can be computed in polynomial time. However, Tutte paths are defined for general 2-connected planar graphs and this is what most applications need. Unfortunately, no computational results are known. We give the first efficient algorithm that computes a Tutte path (for the general case of 2-connected planar graphs). One of the strongest existence results about such Tutte paths is due to Sanders, which allows to prescribe the end vertices and an intermediate edge of the desired path. Encompassing and strengthening all previous computational results on Tutte paths, we show how to compute this special Tutte path efficiently. Our method refines both, the results of Thomassen and Sanders, and avoids overlapping subgraphs by using a novel iterative decomposition along 2-separators. Finally, we show that our algorithm runs in quadratic time.

graph-theory
algorithms
representation
rather-interesting
to-understand
nudge-targets
consider:representation
consider:feature-discovery
january 2018 by Vaguery

[1709.08878] Generating Sentences by Editing Prototypes

january 2018 by Vaguery

We propose a new generative model of sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Compared to traditional models that generate from scratch either left-to-right or by first sampling a latent sentence vector, our prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to human evaluation. Furthermore, the model gives rise to a latent edit vector that captures interpretable semantics such as sentence similarity and sentence-level analogies.

machine-learning
natural-language-processing
rather-interesting
representation
to-write-about
nudge-targets
consider:representation
consider:performance-measures
generative-models
january 2018 by Vaguery

Tanya Khovanova's Math Blog » Blog Archive » 2015 Coin Problem Solution

january 2018 by Vaguery

A while ago I posted my second favorite problem from the 2015 All-Russian Math Olympiad:

mathematical-recreations
puzzles
nudge-targets
consider:looking-to-see
consider:representation
to-write-about
january 2018 by Vaguery

**related tags**

Copy this bookmark: