dynamical-systems   372

« earlier    

[1809.03019] Stochastic Gradient Descent Learns State Equations with Nonlinear Activations
We study discrete time dynamical systems governed by the state equation $h_{t+1}=\phi(Ah_t+Bu_t)$. Here $A,B$ are weight matrices, $\phi$ is an activation function, and $u_t$ is the input data. This relation is the backbone of recurrent neural networks (e.g. LSTMs) which have broad applications in sequential learning tasks. We utilize stochastic gradient descent to learn the weight matrices from a finite input/state trajectory $(u_t,h_t)_{t=0}^N$. We prove that SGD estimate linearly converges to the ground truth weights while using near-optimal sample size. Our results apply to increasing activations whose derivatives are bounded away from zero. The analysis is based on i) a novel SGD convergence result with nonlinear activations and ii) careful statistical characterization of the state vector. Numerical experiments verify the fast convergence of SGD on ReLU and leaky ReLU in consistence with our theory.
papers  to-read  neural-networks  dynamical-systems  system-identification  optimization 
september 2018 by mraginsky
Generalized neural network for nonsmooth nonlinear programming problems - IEEE Journals & Magazine
In 1988 Kennedy and Chua introduced the dynamical canonical nonlinear programming circuit (NPC) to solve in real time nonlinear programming problems where the objective function and the constraints are smooth (twice continuously differentiable) functions. In this paper, a generalized circuit is introduced (G-NPC), which is aimed at solving in real time a much wider class of nonsmooth nonlinear programming problems where the objective function and the constraints are assumed to satisfy only the weak condition of being regular functions. G-NPC, which derives from a natural extension of NPC, has a neural-like architecture and also features the presence of constraint neurons modeled by ideal diodes with infinite slope in the conducting region. By using the Clarke's generalized gradient of the involved functions, G-NPC is shown to obey a gradient system of differential inclusions, and its dynamical behavior and optimization capabilities, both for convex and nonconvex problems, are rigorously analyzed in the framework of nonsmooth analysis and the theory of differential inclusions. In the special important case of linear and quadratic programming problems, salient dynamical features of G-NPC, namely the presence of sliding modes , trajectory convergence in finite time, and the ability to compute the exact optimal solution of the problem being modeled, are uncovered and explained in the developed analytical framework.
papers  to-read  neural-networks  optimization  dynamical-systems  control-theory  stability 
august 2018 by mraginsky
[1806.07366] Neural Ordinary Differential Equations
"We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a blackbox differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models."
neural-net  ode  dynamical-systems 
august 2018 by arsyed
[1705.04353] On the records
World record setting has long attracted public interest and scientific investigation. Extremal records summarize the limits of the space explored by a process, and the historical progression of a record sheds light on the underlying dynamics of the process. Existing analyses of prediction, statistical properties, and ultimate limits of record progressions have focused on particular domains. However, a broad perspective on how record progressions vary across different spheres of activity needs further development. Here we employ cross-cutting metrics to compare records across a variety of domains, including sports, games, biological evolution, and technological development. We find that these domains exhibit characteristic statistical signatures in terms of rates of improvement, "burstiness" of record-breaking time series, and the acceleration of the record breaking process. Specifically, sports and games exhibit the slowest rate of improvement and a wide range of rates of "burstiness." Technology improves at a much faster rate and, unlike other domains, tends to show acceleration in records. Many biological and technological processes are characterized by constant rates of improvement, showing less burstiness than sports and games. It is important to understand how these statistical properties of record progression emerge from the underlying dynamics. Towards this end, we conduct a detailed analysis of a particular record-setting event: elite marathon running. In this domain, we find that studying record-setting data alone can obscure many of the structural properties of the underlying process. The marathon study also illustrates how some of the standard statistical assumptions underlying record progression models may be inappropriate or commonly violated in real-world datasets.
extreme-values  time-series  feature-extraction  rather-interesting  dynamical-systems  to-understand 
june 2018 by Vaguery
[1802.03905] How to Match when All Vertices Arrive Online
We introduce a fully online model of maximum cardinality matching in which all vertices arrive online. On the arrival of a vertex, its incident edges to previously-arrived vertices are revealed. Each vertex has a deadline that is after all its neighbors' arrivals. If a vertex remains unmatched until its deadline, the algorithm must then irrevocably either match it to an unmatched neighbor, or leave it unmatched. The model generalizes the existing one-sided online model and is motivated by applications including ride-sharing platforms, real-estate agency, etc.
We show that the Ranking algorithm by Karp et al. (STOC 1990) is 0.5211-competitive in our fully online model for general graphs. Our analysis brings a novel charging mechanic into the randomized primal dual technique by Devanur et al. (SODA 2013), allowing a vertex other than the two endpoints of a matched edge to share the gain. To our knowledge, this is the first analysis of Ranking that beats 0.5 on general graphs in an online matching problem, a first step towards solving the open problem by Karp et al. (STOC 1990) about the optimality of Ranking on general graphs. If the graph is bipartite, we show that the competitive ratio of Ranking is between 0.5541 and 0.5671. Finally, we prove that the fully online model is strictly harder than the previous model as no online algorithm can be 0.6317<1−1e-competitive in our model even for bipartite graphs.
algorithms  dynamical-systems  graph-theory  rather-interesting  computational-complexity  to-write-about  to-simulate 
june 2018 by Vaguery
The meaning of model equivalence: Network models, latent variables, and the theoretical space in between | Psych Networks
Recently, an important set of equivalent representations of the Ising model was published by Joost Kruis and Gunter Maris in Scientific Reports. The paper constructs elegant representations of the Ising model probability distribution in terms of a network model (which consists of direct relations between observables), a latent variable model (which consists of relations between a latent variable and observables, in which the latent variable acts as a common cause), and a common effect model (which also consists of relations between a latent variable and observables, but here the latent variable acts as a common effect). The latter equivalence is a novel contribution to the literature and a quite surprising finding, because it means that a formative model can be statistically equivalent to a reflective model, which one may not immediately expect (do note that this equivalence need not maintain dimensionality, so a model with a single common effect may translate in a higher-dimensional latent variable model).

However, the equivalence between the ordinary (reflective) latent variable models and network models has been with us for a long time, and I therefore was rather surprised at some people’s reaction to the paper and the blog post that accompanies it. Namely, it appears that some think that (a) the fact that network structures can mimic reflective latent variables and vice versa is a recent discovery, that (b) somehow spells trouble for the network approach itself (because, well, what’s the difference?). The first of these claims is sufficiently wrong to go through the trouble of refuting it, if only to set straight the historical record; the second is sufficiently interesting to investigate it a little more deeply. Hence the following notes.
dynamical-systems  models  models-and-modes  representation  philosophy-of-science  (in-practice)  to-write-about  via:several 
april 2018 by Vaguery
[1803.09574] Long short-term memory and Learning-to-learn in networks of spiking neurons
Networks of spiking neurons (SNNs) are frequently studied as models for networks of neurons in the brain, but also as paradigm for novel energy efficient computing hardware. In principle they are especially suitable for computations in the temporal domain, such as speech processing, because their computations are carried out via events in time and space. But so far they have been lacking the capability to preserve information for longer time spans during a computation, until it is updated or needed - like a register of a digital computer. This function is provided to artificial neural networks through Long Short-Term Memory (LSTM) units. We show here that SNNs attain similar capabilities if one includes adapting neurons in the network. Adaptation denotes an increase of the firing threshold of a neuron after preceding firing. A substantial fraction of neurons in the neocortex of rodents and humans has been found to be adapting. It turns out that if adapting neurons are integrated in a suitable manner into the architecture of SNNs, the performance of these enhanced SNNs, which we call LSNNs, for computation in the temporal domain approaches that of artificial neural networks with LSTM-units. In addition, the computing and learning capabilities of LSNNs can be substantially enhanced through learning-to-learn (L2L) methods from machine learning, that have so far been applied primarily to LSTM networks and apparently never to SSNs.
This preliminary report on arXiv will be replaced by a more detailed version in about a month.
neural-networks  biological-engineering  dynamical-systems  rather-interesting  architecture  machine-learning  simulation  learning-by-watching  to-write-about 
april 2018 by Vaguery
[1706.05621] Double jump phase transition in a random soliton cellular automaton
In this paper, we consider the soliton cellular automaton introduced in [Takahashi 1990] with a random initial configuration. We give multiple constructions of a Young diagram describing various statistics of the system in terms of familiar objects like birth-and-death chains and Galton-Watson forests. Using these ideas, we establish limit theorems showing that if the first n boxes are occupied independently with probability p∈(0,1), then the number of solitons is of order n for all p, and the length of the longest soliton is of order logn for p<1/2, order n‾√ for p=1/2, and order n for p>1/2. Additionally, we uncover a condensation phenomenon in the supercritical regime: For each fixed j≥1, the top j soliton lengths have the same order as the longest for p≤1/2, whereas all but the longest have order at most logn for p>1/2. As an application, we obtain scaling limits for the lengths of the kth longest increasing and decreasing subsequences in a random stack-sortable permutation of length n in terms of random walks and Brownian excursions.
cellular-automata  self-organization  rather-interesting  dynamical-systems  phase-transitions  to-write-about 
february 2018 by Vaguery

« earlier    

related tags

(in-practice)  active-matter  agent-based  algebra  algorithms  analysis  aperiodic-tiling  architecture  asynchronous  automata  billiards  biological-engineering  biology  biophysics  books  boolean-networks  brueckner-networks  category-theory  cellular-automata  chip-firing  cnns  collective-behavior  collective-intelligence  combinatorics  community-detection  complexology  computational-complexity  computational-geometry  consider:classification  consider:control-theory  consider:feature-discovery  consider:fractel-type-in-klapaucius  consider:looking-to-see  consider:req-like-systems  consider:review  consider:robustness  consider:simple-examples  consider:simulation  consider:variant-overlays  constraint-satisfaction  continued-fractions  control-theory  control  coupled-oscillators  courses  deep-learning  define-your-terms  differential-geometry  dynamic-optimization  ecology  economics  electronics  emergence  emergent-design  emulation  engineering-design  engineering  ergodic-theory  evolution  evolutionary-algorithms  explainer  extreme-values  feature-construction  feature-extraction  fractals  game-theory  generative-models  generative-processes  geometry  gradient-descent  graph-layout  graph-theory  heuristics  hey-i-know-these-folks  ifs  inference  inverse-problems  kauffmania  lattice-polymers  learning-by-watching  linear-systems  looking-to-see  machine-learning  macroeconomics  mathematical-model  mathematical-recreations  mathematics  models-and-modes  models  multiobjective-optimization  network-theory  neural-net  neural-networks  neurology  nonlinear-dynamics  nudge-targets  nudge  number-theory  numerics  ode  odes  optimization  out-of-the-box  papers  parameter-scanning  parametrization  pdes  phase-transitions  philosophy-of-science  physics!  physics  plane-geometry  pnas  politics  probability-theory  probability  programming  python  random-walks  rather-interesting  reaction-networks  recurrent-neural-nets  representation  robustness  sandpiles  science  self-organization  self-similarity  signal-processing  simulation  sorting  stability  statistical-mechanics  system-identification  systems-biology  temporal-models  theoretical-biology  tiling  time-series  to-cartoon-about  to-draw  to-invite-to-speak  to-read  to-simulate  to-understand  to-write-about  to-write  visualization  wikipedia 

Copy this bookmark: