dynamical-systems   366
[1809.03019] Stochastic Gradient Descent Learns State Equations with Nonlinear Activations
We study discrete time dynamical systems governed by the state equation $h_{t+1}=\phi(Ah_t+Bu_t)$. Here $A,B$ are weight matrices, $\phi$ is an activation function, and $u_t$ is the input data. This relation is the backbone of recurrent neural networks (e.g. LSTMs) which have broad applications in sequential learning tasks. We utilize stochastic gradient descent to learn the weight matrices from a finite input/state trajectory $(u_t,h_t)_{t=0}^N$. We prove that SGD estimate linearly converges to the ground truth weights while using near-optimal sample size. Our results apply to increasing activations whose derivatives are bounded away from zero. The analysis is based on i) a novel SGD convergence result with nonlinear activations and ii) careful statistical characterization of the state vector. Numerical experiments verify the fast convergence of SGD on ReLU and leaky ReLU in consistence with our theory.
papers  to-read  neural-networks  dynamical-systems  system-identification  optimization
8 days ago by mraginsky
Generalized neural network for nonsmooth nonlinear programming problems - IEEE Journals & Magazine
In 1988 Kennedy and Chua introduced the dynamical canonical nonlinear programming circuit (NPC) to solve in real time nonlinear programming problems where the objective function and the constraints are smooth (twice continuously differentiable) functions. In this paper, a generalized circuit is introduced (G-NPC), which is aimed at solving in real time a much wider class of nonsmooth nonlinear programming problems where the objective function and the constraints are assumed to satisfy only the weak condition of being regular functions. G-NPC, which derives from a natural extension of NPC, has a neural-like architecture and also features the presence of constraint neurons modeled by ideal diodes with infinite slope in the conducting region. By using the Clarke's generalized gradient of the involved functions, G-NPC is shown to obey a gradient system of differential inclusions, and its dynamical behavior and optimization capabilities, both for convex and nonconvex problems, are rigorously analyzed in the framework of nonsmooth analysis and the theory of differential inclusions. In the special important case of linear and quadratic programming problems, salient dynamical features of G-NPC, namely the presence of sliding modes , trajectory convergence in finite time, and the ability to compute the exact optimal solution of the problem being modeled, are uncovered and explained in the developed analytical framework.
papers  to-read  neural-networks  optimization  dynamical-systems  control-theory  stability
5 weeks ago by mraginsky
[1806.07366] Neural Ordinary Differential Equations
"We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a blackbox differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models."
neural-net  ode  dynamical-systems
6 weeks ago by arsyed
[1705.04353] On the records
World record setting has long attracted public interest and scientific investigation. Extremal records summarize the limits of the space explored by a process, and the historical progression of a record sheds light on the underlying dynamics of the process. Existing analyses of prediction, statistical properties, and ultimate limits of record progressions have focused on particular domains. However, a broad perspective on how record progressions vary across different spheres of activity needs further development. Here we employ cross-cutting metrics to compare records across a variety of domains, including sports, games, biological evolution, and technological development. We find that these domains exhibit characteristic statistical signatures in terms of rates of improvement, "burstiness" of record-breaking time series, and the acceleration of the record breaking process. Specifically, sports and games exhibit the slowest rate of improvement and a wide range of rates of "burstiness." Technology improves at a much faster rate and, unlike other domains, tends to show acceleration in records. Many biological and technological processes are characterized by constant rates of improvement, showing less burstiness than sports and games. It is important to understand how these statistical properties of record progression emerge from the underlying dynamics. Towards this end, we conduct a detailed analysis of a particular record-setting event: elite marathon running. In this domain, we find that studying record-setting data alone can obscure many of the structural properties of the underlying process. The marathon study also illustrates how some of the standard statistical assumptions underlying record progression models may be inappropriate or commonly violated in real-world datasets.
extreme-values  time-series  feature-extraction  rather-interesting  dynamical-systems  to-understand
june 2018 by Vaguery
[1802.03905] How to Match when All Vertices Arrive Online
We introduce a fully online model of maximum cardinality matching in which all vertices arrive online. On the arrival of a vertex, its incident edges to previously-arrived vertices are revealed. Each vertex has a deadline that is after all its neighbors' arrivals. If a vertex remains unmatched until its deadline, the algorithm must then irrevocably either match it to an unmatched neighbor, or leave it unmatched. The model generalizes the existing one-sided online model and is motivated by applications including ride-sharing platforms, real-estate agency, etc.
We show that the Ranking algorithm by Karp et al. (STOC 1990) is 0.5211-competitive in our fully online model for general graphs. Our analysis brings a novel charging mechanic into the randomized primal dual technique by Devanur et al. (SODA 2013), allowing a vertex other than the two endpoints of a matched edge to share the gain. To our knowledge, this is the first analysis of Ranking that beats 0.5 on general graphs in an online matching problem, a first step towards solving the open problem by Karp et al. (STOC 1990) about the optimality of Ranking on general graphs. If the graph is bipartite, we show that the competitive ratio of Ranking is between 0.5541 and 0.5671. Finally, we prove that the fully online model is strictly harder than the previous model as no online algorithm can be 0.6317<1−1e-competitive in our model even for bipartite graphs.
algorithms  dynamical-systems  graph-theory  rather-interesting  computational-complexity  to-write-about  to-simulate
june 2018 by Vaguery
The meaning of model equivalence: Network models, latent variables, and the theoretical space in between | Psych Networks
Recently, an important set of equivalent representations of the Ising model was published by Joost Kruis and Gunter Maris in Scientific Reports. The paper constructs elegant representations of the Ising model probability distribution in terms of a network model (which consists of direct relations between observables), a latent variable model (which consists of relations between a latent variable and observables, in which the latent variable acts as a common cause), and a common effect model (which also consists of relations between a latent variable and observables, but here the latent variable acts as a common effect). The latter equivalence is a novel contribution to the literature and a quite surprising finding, because it means that a formative model can be statistically equivalent to a reflective model, which one may not immediately expect (do note that this equivalence need not maintain dimensionality, so a model with a single common effect may translate in a higher-dimensional latent variable model).

However, the equivalence between the ordinary (reflective) latent variable models and network models has been with us for a long time, and I therefore was rather surprised at some people’s reaction to the paper and the blog post that accompanies it. Namely, it appears that some think that (a) the fact that network structures can mimic reflective latent variables and vice versa is a recent discovery, that (b) somehow spells trouble for the network approach itself (because, well, what’s the difference?). The first of these claims is sufficiently wrong to go through the trouble of refuting it, if only to set straight the historical record; the second is sufficiently interesting to investigate it a little more deeply. Hence the following notes.
dynamical-systems  models  models-and-modes  representation  philosophy-of-science  (in-practice)  to-write-about  via:several
april 2018 by Vaguery
[1803.09574] Long short-term memory and Learning-to-learn in networks of spiking neurons
Networks of spiking neurons (SNNs) are frequently studied as models for networks of neurons in the brain, but also as paradigm for novel energy efficient computing hardware. In principle they are especially suitable for computations in the temporal domain, such as speech processing, because their computations are carried out via events in time and space. But so far they have been lacking the capability to preserve information for longer time spans during a computation, until it is updated or needed - like a register of a digital computer. This function is provided to artificial neural networks through Long Short-Term Memory (LSTM) units. We show here that SNNs attain similar capabilities if one includes adapting neurons in the network. Adaptation denotes an increase of the firing threshold of a neuron after preceding firing. A substantial fraction of neurons in the neocortex of rodents and humans has been found to be adapting. It turns out that if adapting neurons are integrated in a suitable manner into the architecture of SNNs, the performance of these enhanced SNNs, which we call LSNNs, for computation in the temporal domain approaches that of artificial neural networks with LSTM-units. In addition, the computing and learning capabilities of LSNNs can be substantially enhanced through learning-to-learn (L2L) methods from machine learning, that have so far been applied primarily to LSTM networks and apparently never to SSNs.
This preliminary report on arXiv will be replaced by a more detailed version in about a month.
neural-networks  biological-engineering  dynamical-systems  rather-interesting  architecture  machine-learning  simulation  learning-by-watching  to-write-about
april 2018 by Vaguery
[1706.05621] Double jump phase transition in a random soliton cellular automaton
In this paper, we consider the soliton cellular automaton introduced in [Takahashi 1990] with a random initial configuration. We give multiple constructions of a Young diagram describing various statistics of the system in terms of familiar objects like birth-and-death chains and Galton-Watson forests. Using these ideas, we establish limit theorems showing that if the first n boxes are occupied independently with probability p∈(0,1), then the number of solitons is of order n for all p, and the length of the longest soliton is of order logn for p<1/2, order n‾√ for p=1/2, and order n for p>1/2. Additionally, we uncover a condensation phenomenon in the supercritical regime: For each fixed j≥1, the top j soliton lengths have the same order as the longest for p≤1/2, whereas all but the longest have order at most logn for p>1/2. As an application, we obtain scaling limits for the lengths of the kth longest increasing and decreasing subsequences in a random stack-sortable permutation of length n in terms of random walks and Brownian excursions.
cellular-automata  self-organization  rather-interesting  dynamical-systems  phase-transitions  to-write-about
february 2018 by Vaguery
[1709.05601] Markov Brains: A Technical Introduction
Markov Brains are a class of evolvable artificial neural networks (ANN). They differ from conventional ANNs in many aspects, but the key difference is that instead of a layered architecture, with each node performing the same function, Markov Brains are networks built from individual computational components. These computational components interact with each other, receive inputs from sensors, and control motor outputs. The function of the computational components, their connections to each other, as well as connections to sensors and motors are all subject to evolutionary optimization. Here we describe in detail how a Markov Brain works, what techniques can be used to study them, and how they can be evolved.
representation  evolutionary-algorithms  dynamical-systems  emergent-design  to-write-about  to-invite-to-speak  hey-I-know-these-folks
february 2018 by Vaguery
Metrical properties for a class of continued fractions with increasing digits - ScienceDirect
In 2002, Hartono, Kraaikamp and Schweiger introduced the Engel continued fractions (ECF), whose partial quotients are increasing. Later, Schweiger generalized it into a class of continued fractions with increasing digits and a parameter ϵ, called generalized continued fractions (GCF). In this paper, we will give some arithmetic properties of such an expansion, and show that the GCF holds similar metric properties with ECF under the condition that
. But when it comes to the condition that
, it does not.
continued-fractions  number-theory  to-understand  algorithms  dynamical-systems  to-write-about
january 2018 by Vaguery
[1606.05099] Invariant measures for continued fraction algorithms with finitely many digits
In this paper we consider continued fraction (CF) expansions on intervals different from [0,1]. For every x in such interval we find a CF expansion with a finite number of possible digits. Using the natural extension, the density of the invariant measure is obtained in a number of examples. In case this method does not work, a Gauss-Kuzmin-L\'evy based approximation method is used. Finally, a subfamily of the N-expansions is studied. In particular, the entropy as a function of a parameter α is estimated for N=2 and N=36. Interesting behavior can be observed from numerical results.