consider:representation   386

« earlier    

[1806.07366] Neural Ordinary Differential Equations
We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a blackbox differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.
neural-networks  machine-learning  representation  rather-interesting  deep-learning  to-understand  consider:representation  to-write-about 
12 days ago by Vaguery
[1802.03676] Differentiable Dynamic Programming for Structured Prediction and Attention
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.
representation  time-series  numerical-methods  deep-learning  rather-interesting  nudge-targets  consider:representation 
may 2018 by Vaguery
[1709.04109] Empower Sequence Labeling with Task-Aware Neural Language Model
Linguistic sequence labeling is a general modeling approach that encompasses a variety of problems, such as part-of-speech tagging and named entity recognition. Recent advances in neural networks (NNs) make it possible to build reliable models without handcrafted features. However, in many cases, it is hard to obtain sufficient annotations to train these models. In this study, we develop a novel neural framework to extract abundant knowledge hidden in raw texts to empower the sequence labeling task. Besides word-level knowledge contained in pre-trained word embeddings, character-aware neural language models are incorporated to extract character-level knowledge. Transfer learning techniques are further adopted to mediate different components and guide the language model towards the key knowledge. Comparing to previous methods, these task-specific knowledge allows us to adopt a more concise model and conduct more efficient training. Different from most transfer learning methods, the proposed framework does not rely on any additional supervision. It extracts knowledge from self-contained order information of training sequences. Extensive experiments on benchmark datasets demonstrate the effectiveness of leveraging character-level knowledge and the efficiency of co-training. For example, on the CoNLL03 NER task, model training completes in about 6 hours on a single GPU, reaching F1 score of 91.71±0.10 without using any extra annotation.
natural-language-processing  deep-learning  neural-networks  nudge-targets  consider:feature-discovery  consider:representation  to-write-about 
march 2018 by Vaguery
A hybrid placement strategy for the three-dimensional strip packing problem - ScienceDirect
This paper presents a hybrid placement strategy for the three-dimensional strip packing problem which involves packing a set of cuboids (‘boxes’) into a three-dimensional bin (parallelepiped) of fixed width and height but unconstrained length (the ‘container’). The goal is to pack all of the boxes into the container, minimising its resulting length. This problem has potential industry application in stock cutting (wood, polystyrene, etc. – minimising wastage) and also cargo loading, as well as other applications in areas such as multi-dimensional resource scheduling. In addition to the proposed strategy a number of test results on available literature benchmark problems are presented and analysed. The results of empirical testing of the algorithm show that it out-performs other methods from the literature, consistently in terms of speed and solution quality-producing 28 best known results from 35 test cases.
operations-research  hyperheuristics  metaheuristics  optimization  constraint-satisfaction  nudge-targets  consider:representation  consider:looking-to-see  to-write-about 
march 2018 by Vaguery
[1709.08359] On the expressive power of query languages for matrices
We investigate the expressive power of 𝖬𝖠𝖳𝖫𝖠𝖭𝖦, a formal language for matrix manipulation based on common matrix operations and linear algebra. The language can be extended with the operation 𝗂𝗇𝗏 of inverting a matrix. In 𝖬𝖠𝖳𝖫𝖠𝖭𝖦+𝗂𝗇𝗏 we can compute the transitive closure of directed graphs, whereas we show that this is not possible without inversion. Indeed we show that the basic language can be simulated in the relational algebra with arithmetic operations, grouping, and summation. We also consider an operation 𝖾𝗂𝗀𝖾𝗇 for diagonalizing a matrix, which is defined so that different eigenvectors returned for a same eigenvalue are orthogonal. We show that 𝗂𝗇𝗏 can be expressed in 𝖬𝖠𝖳𝖫𝖠𝖭𝖦+𝖾𝗂𝗀𝖾𝗇. We put forward the open question whether there are boolean queries about matrices, or generic queries about graphs, expressible in 𝖬𝖠𝖳𝖫𝖠𝖭𝖦+𝖾𝗂𝗀𝖾𝗇 but not in 𝖬𝖠𝖳𝖫𝖠𝖭𝖦+𝗂𝗇𝗏. The evaluation problem for 𝖬𝖠𝖳𝖫𝖠𝖭𝖦+𝖾𝗂𝗀𝖾𝗇 is shown to be complete for the complexity class ∃R.
matrices  programming-language  representation  rather-interesting  nudge  consider:representation  to--do 
march 2018 by Vaguery
[1801.01922] Vectorization of Line Drawings via PolyVector Fields
Image tracing is a foundational component of the workflow in graphic design, engineering, and computer animation, linking hand-drawn concept images to collections of smooth curves needed for geometry processing and editing. Even for clean line drawings, modern algorithms often fail to faithfully vectorize junctions, or points at which curves meet; this produces vector drawings with incorrect connectivity. This subtle issue undermines the practical application of vectorization tools and accounts for hesitance among artists and engineers to use automatic vectorization software. To address this issue, we propose a novel image vectorization method based on state-of-the-art mathematical algorithms for frame field processing. Our algorithm is tailored specifically to disambiguate junctions without sacrificing quality.
graphics  algorithms  inference  rather-interesting  feature-construction  nudge-targets  consider:representation  consider:looking-to-see  consider:performance-measures 
march 2018 by Vaguery
[1802.06415] Solving Large-Scale Minimum-Weight Triangulation Instances to Provable Optimality
We consider practical methods for the problem of finding a minimum-weight triangulation (MWT) of a planar point set, a classic problem of computational geometry with many applications. While Mulzer and Rote proved in 2006 that computing an MWT is NP-hard, Beirouti and Snoeyink showed in 1998 that computing provably optimal solutions for MWT instances of up to 80,000 uniformly distributed points is possible, making use of clever heuristics that are based on geometric insights. We show that these techniques can be refined and extended to instances of much bigger size and different type, based on an array of modifications and parallelizations in combination with more efficient geometric encodings and data structures. As a result, we are able to solve MWT instances with up to 30,000,000 uniformly distributed points in less than 4 minutes to provable optimality. Moreover, we can compute optimal solutions for a vast array of other benchmark instances that are not uniformly distributed, including normally distributed instances (up to 30,000,000 points), all point sets in the TSPLIB (up to 85,900 points), and VLSI instances with up to 744,710 points. This demonstrates that from a practical point of view, MWT instances can be handled quite well, despite their theoretical difficulty.
computational-geometry  algorithms  plane-geometry  optimization  to-write-about  nudge-targets  consider:representation 
march 2018 by Vaguery
[1709.08004] Slow-scale split-step tau-leap method for stiff stochastic chemical systems
Tau-leaping is a family of algorithms for the approximate simulation of discrete state continuous time Markov chains. The motivation for the development of such methods can be found, for instance, in the fields of chemical kinetics and systems biology. It is well known that the dynamical behavior of biochemical systems is often intrinsically stiff representing a serious challenge for their numerical approximation. The naive extension of stiff deterministic solvers to stochastic integration usually yields numerical solutions with either impractically large relaxation times or incorrectly resolved covariance. In this paper, we propose a novel splitting heuristic which allows to resolve these issues. The proposed numerical integrator takes advantage of the special structure of the linear systems with explicitly available equations for the mean and the covariance which we use to calibrate the parameters of the scheme. It is shown that the method is able to reproduce the exact mean and variance of the linear scalar test equation and has very good accuracy for the arbitrarily stiff systems at least in linear case. The numerical examples for both linear and nonlinear systems are also provided and the obtained results confirm the efficiency of the considered splitting approach.
numerical-methods  diffy-Qs  approximation  systems-biology  algorithms  to-write-about  consider:representation  rather-interesting  nudge 
february 2018 by Vaguery
Discovering Types for Entity Disambiguation
Our system uses the following steps:

Extract every Wikipedia-internal link to determine, for each word, the set of conceivable entities it can refer to. For example, when encountering the link [jaguar]( in a Wikipedia page, we conclude that is one of the meanings of jaguar.
Walk the Wikipedia category tree (using the Wikidata knowledge graph) to determine, for each entity, the set of categories it belongs to. For example, at the bottom of’s Wikipedia page, are the following categories (which themselves have their own categories, such as Automobiles):
Pick a list of ~100 categories to be your “type” system, and optimize over this choice of categories so that they compactly express any entity. We know the mapping of entities to categories, so given a type system, we can represent each entity as a ~100-dimensional binary vector indicating membership in each category.
Using every Wikipedia-internal link and its surrounding context, produce training data mapping a word plus context to the ~100-dimensional binary representation of the corresponding entity, and train a neural network to predict this mapping. This chains together the previous steps: Wikipedia links map a word to an entity, we know the categories for each entity from step 2, and step 3 picked the categories in our type system.
At test time, given a word and surrounding context, our neural network’s output can be interpreted as the probability that the word belongs to each category. If we knew the exact set of category memberships, we would narrow down to one entity (assuming well-chosen categories). But instead, we must play a probabilistic 20 questions: use Bayes’ theorem to calculate the chance of the word disambiguating to each of its possible entities.
machine-learning  disambiguation  data-fusion  to-write-about  consider:representation 
february 2018 by Vaguery
[1802.01021] DeepType: Multilingual Entity Linking by Neural Type System Evolution
The wealth of structured (e.g. Wikidata) and unstructured data about the world available today presents an incredible opportunity for tomorrow's Artificial Intelligence. So far, integration of these two different modalities is a difficult process, involving many decisions concerning how best to represent the information so that it will be captured or useful, and hand-labeling large amounts of data. DeepType overcomes this challenge by explicitly integrating symbolic information into the reasoning process of a neural network with a type system. First we construct a type system, and second, we use it to constrain the outputs of a neural network to respect the symbolic structure. We achieve this by reformulating the design problem into a mixed integer problem: create a type system and subsequently train a neural network with it. In this reformulation discrete variables select which parent-child relations from an ontology are types within the type system, while continuous variables control a classifier fit to the type system. The original problem cannot be solved exactly, so we propose a 2-step algorithm: 1) heuristic search or stochastic optimization over discrete variables that define a type system informed by an Oracle and a Learnability heuristic, 2) gradient descent to fit classifier parameters. We apply DeepType to the problem of Entity Linking on three standard datasets (i.e. WikiDisamb30, CoNLL (YAGO), TAC KBP 2010) and find that it outperforms all existing solutions by a wide margin, including approaches that rely on a human-designed type system or recent deep learning-based entity embeddings, while explicitly using symbolic information lets it integrate new entities without retraining.
data-fusion  machine-learning  deep-learning  rather-interesting  inference  classification  to-write-about  consider:representation 
february 2018 by Vaguery
[1607.01985] A Common Derivation for Markov Chain Monte Carlo Algorithms with Tractable and Intractable Targets
Markov chain Monte Carlo is a class of algorithms for drawing Markovian samples from high-dimensional target densities to approximate the numerical integration associated with computing statistical expectation, especially in Bayesian statistics. However, many Markov chain Monte Carlo algorithms do not seem to share the same theoretical support and each algorithm is proven in a different way. This incurs many terminologies and ancillary concepts, which makes Markov chain Monte Carlo literature seems to be scattered and intimidating to researchers from many other fields, including new researchers of Bayesian statistics.
A generalised version of the Metropolis-Hastings algorithm is constructed with a random number generator and a self-reverse mapping. This formulation admits many other Markov chain Monte Carlo algorithms as special cases. A common derivation for many Markov chain Monte Carlo algorithms is useful in drawing connections and comparisons between these algorithms. As a result, we now can construct many novel combinations of multiple Markov chain Monte Carlo algorithms that amplify the efficiency of each individual algorithm. Specifically, we propose two novel sampling schemes that combine slice sampling with directional or Hamiltonian sampling. Our Hamiltonian slice sampling scheme is also applicable in the pseudo-marginal context where the target density is intractable but can be unbiasedly estimated, e.g. using particle filtering.
sampling  algorithms  representation  rather-interesting  nudge-targets  consider:representation  define-your-terms 
february 2018 by Vaguery
I’m going to continue my analysis of classic card and board games by looking at the game of Battleship. (See early postings for analysis of Chutes & Ladders, Candyland and Risk).
dammit-arthegall  games  game-theory  probability-theory  simulation  nudge-targets  to-write-about  consider:representation 
february 2018 by Vaguery
[1802.04196] Universal quantum computing and three-manifolds
A single qubit may be represented on the Bloch sphere or similarly on the 3-sphere S3. Our goal is to dress this correspondence by converting the language of universal quantum computing (uqc) to that of 3-manifolds. A magic state and the Pauli group acting on it define a model of uqc as a POVM that one recognizes to be a 3-manifold M3. E. g., the d-dimensional POVMs defined from subgroups of finite index of the modular group PSL(2,ℤ) correspond to d-fold M3- coverings over the trefoil knot. In this paper, one also investigates quantum information on a few \lq universal' knots and links such as the figure-of-eight knot, the Whitehead link and Borromean rings, making use of the catalog of platonic manifolds available on SnapPy. Further connections between POVMs based uqc and M3's obtained from Dehn fillings are explored.
quantums  quantum-computing  representation  rather-interesting  to-write-about  nudge  consider:representation 
february 2018 by Vaguery
[1709.04851] Factor Analysis of Interval Data
This paper presents a factor analysis model for symbolic data, focusing on the particular case of interval-valued variables. The proposed method describes the correlation structure among the measured interval-valued variables in terms of a few underlying, but unobservable, uncorrelated interval-valued variables, called \textit{common factors}. Uniform and Triangular distributions are considered within each observed interval. We obtain the corresponding sample mean, variance and covariance assuming a general Triangular distribution.
In our proposal, factors are extracted either by Principal Component or by Principal Axis Factoring, performed on the interval-valued variables correlation matrix. To estimate the values of the common factors, usually called \textit{factor scores}, two approaches are considered, which are inspired in methods for real-valued data: the Bartlett and the Anderson-Rubin methods. In both cases, the estimated values are obtained solving an optimization problem that minimizes a function of the weighted squared Mallows distance between quantile functions. Explicit expressions for the quantile function and the squared Mallows distance are derived assuming a general Triangular distribution.
The applicability of the method is illustrated using two sets of data: temperature and precipitation in cities of the United States of America between the years 1971 and 2000 and measures of car characteristics of different makes and models. Moreover, the method is evaluated on synthetic data with predefined correlation structures.
representation  statistics  algorithms  rather-interesting  to-write-about  nudge  consider:representation 
february 2018 by Vaguery
Creating Intricate Art with Neural Style Transfer – Becoming Human: Artificial Intelligence Magazine
The architecture is based on Gatys’ style transfer algorithm with a few minor modifications. In this case, the content image is a silhouette and style image can be any pattern (ranging from simple black and white doodle to more complex color mosaics). The code also contains a module to invert and create a mask based on the content image, which will eventually be applied to the generated pattern.
generative-art  generative-models  style-transfer  computational-recreations  to-write-about  to-do  nudge-targets  consider:representation 
february 2018 by Vaguery
Maverick Solitaire
An early form of Poker solitaire is actually a puzzle of sorts, one which has been called Maverick solitaire (after its appearance on the 1950's/1960's Western T.V. show Maverick, in the first season episode Rope of Cards).  Twenty five cards are dealt from a shuffled 52-card deck.  The object is to divide the 25 cards into five groups of five cards so that each is a pat hand in draw poker (a hand which does not need to be drawn to).  Maverick Solitaire is well-known as a sucker bet, as the probability of success with a random group of 25 cards would seem low, but is actually quite high: the eponymous author of Maverick's Guide to Poker estimates the odds to be at least 98 percent (he disallows four of a kind).  This is remarkably accurate: Mark Masten's computer solver, allowing four of a kind, solved about 98.1 percent of a random set of 1000 deals.  Deals with unique solutions are even less common than impossible ones: the sample above had 19 impossible deals and only 8 with unique solutions.
mathematical-recreations  cards  nudge-targets  consider:representation  consider:looking-to-see 
february 2018 by Vaguery
[1707.05994] Computing Tutte Paths
Tutte paths are one of the most successful tools for attacking Hamiltonicity problems in planar graphs. Unfortunately, results based on them are non-constructive, as their proofs inherently use an induction on overlapping subgraphs and these overlaps hinder to bound the running time to a polynomial. For special cases however, computational results of Tutte paths are known: For 4-connected planar graphs, Tutte paths are in fact Hamiltonian paths and Chiba and Nishizeki showed how to compute such paths in linear time. For 3-connected planar graphs, Tutte paths have a more complicated structure, and it has only recently been shown that they can be computed in polynomial time. However, Tutte paths are defined for general 2-connected planar graphs and this is what most applications need. Unfortunately, no computational results are known. We give the first efficient algorithm that computes a Tutte path (for the general case of 2-connected planar graphs). One of the strongest existence results about such Tutte paths is due to Sanders, which allows to prescribe the end vertices and an intermediate edge of the desired path. Encompassing and strengthening all previous computational results on Tutte paths, we show how to compute this special Tutte path efficiently. Our method refines both, the results of Thomassen and Sanders, and avoids overlapping subgraphs by using a novel iterative decomposition along 2-separators. Finally, we show that our algorithm runs in quadratic time.
graph-theory  algorithms  representation  rather-interesting  to-understand  nudge-targets  consider:representation  consider:feature-discovery 
january 2018 by Vaguery
[1709.08878] Generating Sentences by Editing Prototypes
We propose a new generative model of sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Compared to traditional models that generate from scratch either left-to-right or by first sampling a latent sentence vector, our prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to human evaluation. Furthermore, the model gives rise to a latent edit vector that captures interpretable semantics such as sentence similarity and sentence-level analogies.
machine-learning  natural-language-processing  rather-interesting  representation  to-write-about  nudge-targets  consider:representation  consider:performance-measures  generative-models 
january 2018 by Vaguery

« earlier    

related tags

"reward-hacking"  algebra  algorithms  amusing-because-no-refs  approximation  architecture  attention  benchmarking  benchmarks  bioinformatics  boolean-networks  cards  cellular-automata  classification  clustering  collective-behavior  combinatorics  compressed-sensing  computational-complexity  computational-geometry  computational-recreations  computer-vision  consider:algorithmic-approach  consider:algorithms  consider:approximation  consider:classification-of-rules  consider:classification  consider:feature-discovery  consider:higher-order-problems  consider:libraries  consider:looking-to-see  consider:multiobjective-optimization  consider:other-applications  consider:performance-measures  consider:planning  consider:prediction  consider:rediscovery  consider:robustness  consider:splinegons  constraint-satisfaction  construction  continued-fractions  control-theory  convex-hulls  dammit-arthegall  data-analysis  data-fusion  data-mining  deep-learning  define-your-terms  diffy-qs  disambiguation  distance  distributed-processing  dynamic-optimization  dynamical-systems  emergent-design  engineering-design  exotic-math  experimental-design  face-recognition  feature-construction  feature-extraction  fractals  game-theory  games  generalization  generative-art  generative-models  geometry  grammar  graph-layout  graph-theory  graphics  group-theory  handwriting-recognition  heuristics  hey-i-know-this-guy  history  hypergraphs  hyperheuristics  image-processing  inference  information-theory  integer-programming  inverse-problems  kauffmania  knot-theory  l-systems  learning-from-data  limits  linear-algebra  lovely-little-problems  lovely  machine-learning  magic-squares  markets  mathematical-programming  mathematical-recreations  mathematics  matrices  mechanism-design  metaheuristics  metrics  modeling-is-not-mathematics  modeling  models  natural-language-processing  neural-networks  nudge-targets  nudge  number-theory  numerical-methods  ocr  online-learning  operations-research  optimization  out-of-the-box  packing  parsing  pattern-discovery  performance-measure  philosophy  plane-geometry  planning  polyominoes  prediction  probabilistic-languages  probability-theory  program-synthesis  programmable-matter  programming-language  proof  purdy-pitchers  puzzles  quantum-computing  quantums  rather-interesting  recurrent-networks  representation  reveal-intent  rewriting-systems  robotics  robustness  rostering  rotor-router  sampling  satisfiability  scheduling  see-also:lee's-book  self-organization  sensors  sentiment-analysis  sequences  signal-processing  simulation  software  sound-recognition  spectroscopy  statistics  strings  style-transfer  superresolution  system-of-professions  systems-biology  tangrams  time-series  to--do  to-do  to-explore  to-understand  to-write-about  to-write  topology  what-gets-measured-gets-fudged  zero-shot-learning 

Copy this bookmark: