**Vaguery + looking-to-see**
75

Evolution of metazoan morphological disparity | PNAS

2 days ago by Vaguery

We attempt to quantify animal “bodyplans” and their variation within Metazoa. Our results challenge the view that maximum variation was achieved early in animal evolutionary history by nonuniformitarian mechanisms. Rather, they are compatible with the view that the capacity for fundamental innovation is not limited to the early evolutionary history of clades. We perform quantitative tests of the principal hypotheses of the molecular mechanisms underpinning the establishment of animal bodyplans and corroborate the hypothesis that animal evolution has been permitted or driven by gene regulatory evolution.

developmental-biology
biology
rather-interesting
clustering
looking-to-see
to-write-about
consider:feature-discovery
2 days ago by Vaguery

[1804.10962] Stress anisotropy in shear-jammed packings of frictionless disks

8 days ago by Vaguery

We perform computational studies of repulsive, frictionless disks to investigate the development of stress anisotropy in mechanically stable (MS) packings. We focus on two protocols for generating MS packings: 1) isotropic compression and 2) applied simple or pure shear strain γ at fixed packing fraction ϕ. MS packings of frictionless disks occur as geometric families (i.e. parabolic segments with positive curvature) in the ϕ-γ plane. MS packings from protocol 1 populate parabolic segments with both signs of the slope, dϕ/dγ>0 and dϕ/dγ<0. In contrast, MS packings from protocol 2 populate segments with dϕ/dγ<0 only. For both simple and pure shear, we derive a relationship between the stress anisotropy and dilatancy dϕ/dγ obeyed by MS packings along geometrical families. We show that for MS packings prepared using isotropic compression, the stress anisotropy distribution is Gaussian centered at zero with a standard deviation that decreases with increasing system size. For shear jammed MS packings, the stress anisotropy distribution is a convolution of Weibull distributions that depend on strain, which has a nonzero average and standard deviation in the large-system limit. We also develop a framework to calculate the stress anisotropy distribution for packings generated via protocol 2 in terms of the stress anisotropy distribution for packings generated via protocol 1. These results emphasize that for repulsive frictionless disks, different packing-generation protocols give rise to different MS packing probabilities, which lead to differences in macroscopic properties of MS packings.

physics!
sandpiles
materials-science
simulation
rather-interesting
condensed-matter
phase-transitions
looking-to-see
8 days ago by Vaguery

Coder-Physicists Are Simulating the Universe to Unlock Its Secrets | Quanta Magazine

9 days ago by Vaguery

These small, faint galaxies have always presented problems. The “missing satellite problem,” for instance, is the expectation, based on standard cold dark matter models, that hundreds of satellite galaxies should orbit every spiral galaxy. But the Milky Way has just dozens. This has caused some physicists to contemplate more complicated models of dark matter. However, when Hopkins and colleagues incorporated realistic superbubbles into their simulations, they saw many of those excess satellite galaxies go away. Hopkins has also found potential resolutions to two other problems, called “cusp-core” and “too-big-to-fail,” that have troubled the cold dark matter paradigm.

simulation
looking-to-see
astronomy
rather-interesting
to-write-about
the-mangle-in-practice
(totally)
9 days ago by Vaguery

Orthogonal polygons | The Math Less Traveled

may 2018 by Vaguery

Quite a few commenters figured out what was going on, and mentioned several nice (equivalent) ways to think about it. Primarily, the idea is to draw all possible orthogonal polygons, that is, polygons with only right angles, organized by the total number of vertices. (So, for example, the picture above shows all orthogonal polygons with exactly ten vertices.) However, we have to be careful what we mean by the phrase “all possible”: there would be an infinite number of such polygons if we think about things like the precise lengths of edges. So we have to say when two polygons are considered the same, and when they are distinct. My rules are as follows:

mathematical-recreations
looking-to-see
geometry
polyominoes
to-write-about
may 2018 by Vaguery

Cooking the books – Almost looks like work

may 2018 by Vaguery

Since Christmas, at my house we’ve been cooking with 5 ingredients or fewer thanks to the acquisition of Jamie Oliver’s new book, the contents of which are mostly available online here. The recipes are unanimously very tasty, but that’s besides the point. The real mark of culinary excellence (in my humble opinion) is how efficiently one can buy ingredients to make as many of the recipes as possible in one shopping trip. Let’s investigate while the lamb is on.

Each of the 135 recipes in the book consists of 5 ingredients, some of which overlap. It is therefore not necessary to purchase 675 ingredients, there are actually only 239 unique ones. (Yes, I did spend a Sunday morning typing 675 individual ingredients into a spreadsheet.)

The question is then this:

In which order should I buy my ingredients to maximise the number of possible recipes as a function of number of ingredients?

Let’s start simply, and look at the frequency of occurrence of the ingredients.

mathematical-recreations
looking-to-see
cooking
data-analysis
leading-questions
rather-interesting
Each of the 135 recipes in the book consists of 5 ingredients, some of which overlap. It is therefore not necessary to purchase 675 ingredients, there are actually only 239 unique ones. (Yes, I did spend a Sunday morning typing 675 individual ingredients into a spreadsheet.)

The question is then this:

In which order should I buy my ingredients to maximise the number of possible recipes as a function of number of ingredients?

Let’s start simply, and look at the frequency of occurrence of the ingredients.

may 2018 by Vaguery

mathrecreation: Tigers and Treasure

april 2018 by Vaguery

The graph below shows all 196 possible puzzles. The puzzles that lead to contradictions are coloured as white squares, the puzzles that do not lead to contradictions are coloured in black. We can see a white strip across the bottom: these are all the puzzles where door 2 is labelled with statement 1. Can you find the statement that always leads to contradictions for door 1? There are 131 puzzles that are "good" in the sense that they do not lead to contradictions.

mathematical-recreations
logic
looking-to-see
to-write-about
consider:robustness
nudge-targets
april 2018 by Vaguery

[1802.01396] To understand deep learning we need to understand kernel learning

april 2018 by Vaguery

Generalization performance of classifiers in deep learning has recently become a subject of intense study. Heavily over-parametrized deep models tend to fit training data exactly. Despite overfitting, they perform well on test data, a phenomenon not yet fully understood.

The first point of our paper is that strong performance of overfitted classifiers is not a unique feature of deep learning. Using real-world and synthetic datasets, we establish that kernel classifiers trained to have zero classification error (overfitting) or even zero regression error (interpolation) perform very well on test data.

We proceed to prove lower bounds on the norm of overfitted solutions for smooth kernels, showing that they increase nearly exponentially with the data size. Since most generalization bounds depend polynomially on the norm of the solution, this result implies that they diverge as data increases. Furthermore, the existing bounds do not apply to interpolated classifiers.

We also show experimentally that (non-smooth) Laplacian kernels easily fit random labels using a version of SGD, a finding that parallels results reported for ReLU neural networks. In contrast, fitting noisy data requires many more epochs for smooth Gaussian kernels. The observation that the performance of overfitted Laplacian and Gaussian classifiers on the test is quite similar, suggests that generalization is tied to the properties of the kernel function rather than the optimization process.

We see that some key phenomena of deep learning are manifested similarly in kernel methods in the overfitted regime. We argue that progress on understanding deep learning will be difficult, until more analytically tractable "shallow" kernel methods are better understood. The experimental and theoretical results presented in this paper indicate a need for new theoretical ideas for understanding classical kernel methods.

machine-learning
philosophy-of-engineering
looking-to-see
statistics
algorithms
explanation
to-write-about
explanatory-power-and-narrative-discovery
The first point of our paper is that strong performance of overfitted classifiers is not a unique feature of deep learning. Using real-world and synthetic datasets, we establish that kernel classifiers trained to have zero classification error (overfitting) or even zero regression error (interpolation) perform very well on test data.

We proceed to prove lower bounds on the norm of overfitted solutions for smooth kernels, showing that they increase nearly exponentially with the data size. Since most generalization bounds depend polynomially on the norm of the solution, this result implies that they diverge as data increases. Furthermore, the existing bounds do not apply to interpolated classifiers.

We also show experimentally that (non-smooth) Laplacian kernels easily fit random labels using a version of SGD, a finding that parallels results reported for ReLU neural networks. In contrast, fitting noisy data requires many more epochs for smooth Gaussian kernels. The observation that the performance of overfitted Laplacian and Gaussian classifiers on the test is quite similar, suggests that generalization is tied to the properties of the kernel function rather than the optimization process.

We see that some key phenomena of deep learning are manifested similarly in kernel methods in the overfitted regime. We argue that progress on understanding deep learning will be difficult, until more analytically tractable "shallow" kernel methods are better understood. The experimental and theoretical results presented in this paper indicate a need for new theoretical ideas for understanding classical kernel methods.

april 2018 by Vaguery

[1709.04594] Revisiting Spectral Graph Clustering with Generative Community Models

march 2018 by Vaguery

The methodology of community detection can be divided into two principles: imposing a network model on a given graph, or optimizing a designed objective function. The former provides guarantees on theoretical detectability but falls short when the graph is inconsistent with the underlying model. The latter is model-free but fails to provide quality assurance for the detected communities. In this paper, we propose a novel unified framework to combine the advantages of these two principles. The presented method, SGC-GEN, not only considers the detection error caused by the corresponding model mismatch to a given graph, but also yields a theoretical guarantee on community detectability by analyzing Spectral Graph Clustering (SGC) under GENerative community models (GCMs). SGC-GEN incorporates the predictability on correct community detection with a measure of community fitness to GCMs. It resembles the formulation of supervised learning problems by enabling various community detection loss functions and model mismatch metrics. We further establish a theoretical condition for correct community detection using the normalized graph Laplacian matrix under a GCM, which provides a novel data-driven loss function for SGC-GEN. In addition, we present an effective algorithm to implement SGC-GEN, and show that the computational complexity of SGC-GEN is comparable to the baseline methods. Our experiments on 18 real-world datasets demonstrate that SGC-GEN possesses superior and robust performance compared to 6 baseline methods under 7 representative clustering metrics.

community-detection
network-theory
algorithms
looking-to-see
clustering
to-write-about
the-pragmatics-of-the-thing
march 2018 by Vaguery

[1801.00548] A Machine Learning Approach to Adaptive Covariance Localization

march 2018 by Vaguery

Data assimilation plays a key role in large-scale atmospheric weather forecasting, where the state of the physical system is estimated from model outputs and observations, and is then used as initial condition to produce accurate future forecasts. The Ensemble Kalman Filter (EnKF) provides a practical implementation of the statistical solution of the data assimilation problem and has gained wide popularity as. This success can be attributed to its simple formulation and ease of implementation. EnKF is a Monte-Carlo algorithm that solves the data assimilation problem by sampling the probability distributions involved in Bayes theorem. Because of this, all flavors of EnKF are fundamentally prone to sampling errors when the ensemble size is small. In typical weather forecasting applications, the model state space has dimension 109−1012, while the ensemble size typically ranges between 30−100 members. Sampling errors manifest themselves as long-range spurious correlations and have been shown to cause filter divergence. To alleviate this effect covariance localization dampens spurious correlations between state variables located at a large distance in the physical space, via an empirical distance-dependent function. The quality of the resulting analysis and forecast is greatly influenced by the choice of the localization function parameters, e.g., the radius of influence. The localization radius is generally tuned empirically to yield desirable results.This work, proposes two adaptive algorithms for covariance localization in the EnKF framework, both based on a machine learning approach. The first algorithm adapts the localization radius in time, while the second algorithm tunes the localization radius in both time and space. Numerical experiments carried out with the Lorenz-96 model, and a quasi-geostrophic model, reveal the potential of the proposed machine learning approaches.

modeling
machine-learning
prediction
rather-interesting
looking-to-see
approximation
algorithms
to-write-about
march 2018 by Vaguery

[1704.03636] Energy Propagation in Deep Convolutional Neural Networks

march 2018 by Vaguery

Many practical machine learning tasks employ very deep convolutional neural networks. Such large depths pose formidable computational challenges in training and operating the network. It is therefore important to understand how fast the energy contained in the propagated signals (a.k.a. feature maps) decays across layers. In addition, it is desirable that the feature extractor generated by the network be informative in the sense of the only signal mapping to the all-zeros feature vector being the zero input signal. This "trivial null-set" property can be accomplished by asking for "energy conservation" in the sense of the energy in the feature vector being proportional to that of the corresponding input signal. This paper establishes conditions for energy conservation (and thus for a trivial null-set) for a wide class of deep convolutional neural network-based feature extractors and characterizes corresponding feature map energy decay rates. Specifically, we consider general scattering networks employing the modulus non-linearity and we find that under mild analyticity and high-pass conditions on the filters (which encompass, inter alia, various constructions of Weyl-Heisenberg filters, wavelets, ridgelets, (α)-curvelets, and shearlets) the feature map energy decays at least polynomially fast. For broad families of wavelets and Weyl-Heisenberg filters, the guaranteed decay rate is shown to be exponential. Moreover, we provide handy estimates of the number of layers needed to have at least ((1−ε)⋅100)% of the input signal energy be contained in the feature vector.

deep-learning
signal-processing
rather-interesting
information-theory
to-understand
looking-to-see
machine-learning
march 2018 by Vaguery

[1802.09979] The Emergence of Spectral Universality in Deep Networks

march 2018 by Vaguery

Recent work has shown that tight concentration of the entire spectrum of singular values of a deep network's input-output Jacobian around one at initialization can speed up learning by orders of magnitude. Therefore, to guide important design choices, it is important to build a full theoretical understanding of the spectra of Jacobians at initialization. To this end, we leverage powerful tools from free probability theory to provide a detailed analytic understanding of how a deep network's Jacobian spectrum depends on various hyperparameters including the nonlinearity, the weight and bias distributions, and the depth. For a variety of nonlinearities, our work reveals the emergence of new universal limiting spectral distributions that remain concentrated around one even as the depth goes to infinity.

deep-learning
looking-to-see
machine-learning
neural-networks
to-understand
march 2018 by Vaguery

[1606.05099] Invariant measures for continued fraction algorithms with finitely many digits

january 2018 by Vaguery

In this paper we consider continued fraction (CF) expansions on intervals different from [0,1]. For every x in such interval we find a CF expansion with a finite number of possible digits. Using the natural extension, the density of the invariant measure is obtained in a number of examples. In case this method does not work, a Gauss-Kuzmin-L\'evy based approximation method is used. Finally, a subfamily of the N-expansions is studied. In particular, the entropy as a function of a parameter α is estimated for N=2 and N=36. Interesting behavior can be observed from numerical results.

continued-fractions
number-theory
looking-to-see
dynamical-systems
to-write-about
january 2018 by Vaguery

[1711.01711] Coding-theorem Like Behaviour and Emergence of the Universal Distribution from Resource-bounded Algorithmic Probability

january 2018 by Vaguery

Previously referred to as 'miraculous' because of its surprisingly powerful properties and its application as the optimal theoretical solution to induction/inference, (approximations to) Algorithmic Probability (AP) and the Universal Distribution are of the greatest importance in computer science and science in general. Here we investigate the emergence, the rates of emergence and convergence, and the Coding-theorem like behaviour of AP in subuniversal models of computation. We investigate empirical distributions of computer programs of weaker computational power according to the Chomsky hierarchy. We introduce measures of algorithmic probability and algorithmic complexity based upon resource-bounded computation, in contrast to previously thoroughly investigated distributions produced from the output distribution of Turing machines. This approach allows for numerical approximations to algorithmic (Kolmogorov-Chaitin) complexity-based estimations at each of the levels of a computational hierarchy. We demonstrate that all these estimations are correlated in rank and that they converge both in rank and values as a function of computational power, despite the fundamental differences of each computational model. In the context of natural processes that may operate below the Turing universal level due to the constraint of resources and physical degradation, the investigation of natural biases coming from algorithmic laws is highly relevant. We show that the simplicity/complexity bias in distributions produced even by the weakest of the computational models can be accounted up to 60% by Algorithmic Probability in its approximation to the Universal Distribution.

information-theory
computational-complexity
to-understand
to-write-about
looking-to-see
january 2018 by Vaguery

CTK Insights » Blog Archive A Discovery of Hirotaka Ebisui And Thanos Kalogerakis - CTK Insights

november 2017 by Vaguery

Hirotaka Ebisui has found that in the case of the right triangle,

A

′

+

B

′

=

5

C

′

A

′

+B

′

=5C

′

. On informing me of that result, Thanos added an identity for the next layer

A

′

′

+

B

′

′

=

C

′

′

.

A

′′

+B

′′

=C

′′

.

That was exciting enough for me to investigate. I can happily and proudly report that, for the next layer,

A

′

′

′

+

B

′

′

′

=

5

C

′

′

′

A

′′′

+B

′′′

=5C

′′′

.

plane-geometry
rather-interesting
mathematical-recreations
looking-to-see
to-write-about
A

′

+

B

′

=

5

C

′

A

′

+B

′

=5C

′

. On informing me of that result, Thanos added an identity for the next layer

A

′

′

+

B

′

′

=

C

′

′

.

A

′′

+B

′′

=C

′′

.

That was exciting enough for me to investigate. I can happily and proudly report that, for the next layer,

A

′

′

′

+

B

′

′

′

=

5

C

′

′

′

A

′′′

+B

′′′

=5C

′′′

.

november 2017 by Vaguery

[1711.00867] The (Un)reliability of saliency methods

november 2017 by Vaguery

Saliency methods aim to explain the predictions of deep neural networks. These methods lack reliability when the explanation is sensitive to factors that do not contribute to the model prediction. We use a simple and common pre-processing step ---adding a constant shift to the input data--- to show that a transformation with no effect on the model can cause numerous methods to incorrectly attribute. In order to guarantee reliability, we posit that methods should fulfill input invariance, the requirement that a saliency method mirror the sensitivity of the model with respect to transformations of the input. We show, through several examples, that saliency methods that do not satisfy input invariance result in misleading attribution.

via:?
machine-learning
looking-to-see
saliency
define-your-terms
algorithms
criticism
to-write-about
november 2017 by Vaguery

Knots, Links, & Learning | arbitrarilyclose

november 2017 by Vaguery

If any of this is compelling to you, I would love it if you wanted to join me in this investigation. I don’t want someone to show up and say, “here’s the final solution, dummy, it’s already all been solved,” because I’m sure that this is already a chapter in a textbook somewhere. I don’t care about that. I care about investigating for myself and with friends what’s happening. Feel free to share ideas below, and I would love it if you did some of your own drawings and joined in on the party.

learning-in-public
mathematical-recreations
looking-to-see
lovely
november 2017 by Vaguery

[1710.00479] Factor selection by permutation

october 2017 by Vaguery

Researchers often have data measuring features xij of samples, such as test scores of students. In factor analysis and PCA, these features are thought to be influenced by unobserved factors, such as skills. Can we determine how many factors affect the data? Many approaches have been developed for this factor selection problem. The popular Parallel Analysis method randomly permutes each feature of the data. It selects factors if their singular values are larger than those of the permuted data. It is used by leading applied statisticians, including T Hastie, M Stephens, J Storey, R Tibshirani and WH Wong. Despite empirical evidence for its accuracy, there is currently no theoretical justification. This prevents us from knowing when it will work in the future.

In this paper, we show that parallel analysis consistently selects the significant factors in certain high-dimensional factor models. The intuition is that permutations keep the noise invariant, while "destroying" the low-rank signal. This provides justification for permutation methods in PCA and factor models under some conditions. A key requirement is that the factors must load on several variables. Our work points to improvements of permutation methods.

modeling
statistics
rather-interesting
information-theory
signal-processing
feature-extraction
looking-to-see
to-write-about
consider:lexicase
In this paper, we show that parallel analysis consistently selects the significant factors in certain high-dimensional factor models. The intuition is that permutations keep the noise invariant, while "destroying" the low-rank signal. This provides justification for permutation methods in PCA and factor models under some conditions. A key requirement is that the factors must load on several variables. Our work points to improvements of permutation methods.

october 2017 by Vaguery

[1512.03421] The extended 1-perfect trades in small hypercubes

october 2017 by Vaguery

An extended 1-perfect trade is a pair (T0,T1) of two disjoint binary distance-4 even-weight codes such that the set of words at distance 1 from T0 coincides with the set of words at distance 1 from T1. Such trade is called primary if any pair of proper subsets of T0 and T1 is not a trade. Using a computer-aided approach, we classify nonequivalent primary extended 1-perfect trades of length 10, constant-weight extended 1-perfect trades of length 12, and Steiner trades derived from them. In particular, all Steiner trades with parameters (5,6,12) are classified.

combinatorics
to-understand
graph-theory
strings
optimization
constraint-satisfaction
looking-to-see
nudge-targets
consider:looking-to-see
consider:classification
consider:feature-discovery
october 2017 by Vaguery

[1502.04644] Beyond the Runs Theorem

october 2017 by Vaguery

Recently, a short and elegant proof was presented showing that a binary word of length n contains at most n−3 runs. Here we show, using the same technique and a computer search, that the number of runs in a binary word of length n is at most 2223n<0.957n.

strings
proof
permutations
looking-to-see
to-write-about
nudge-targets
october 2017 by Vaguery

[1605.04550] Finite-size effects and percolation properties of Poisson geometries

october 2017 by Vaguery

Random tessellations of the space represent a class of prototype models of heterogeneous media, which are central in several applications in physics, engineering and life sciences. In this work, we investigate the statistical properties of d-dimensional isotropic Poisson geometries by resorting to Monte Carlo simulation, with special emphasis on the case d=3. We first analyse the behaviour of the key features of these stochastic geometries as a function of the dimension d and the linear size L of the domain. Then, we consider the case of Poisson binary mixtures, where the polyhedra are assigned two `labels' with complementary probabilities. For this latter class of random geometries, we numerically characterize the percolation threshold, the strength of the percolating cluster and the average cluster size.

probability-theory
computational-geometry
statistical-mechanics
looking-to-see
simulation
rather-interesting
feature-extraction
to-write-about
october 2017 by Vaguery

[1608.08225] Why does deep and cheap learning work so well?

october 2017 by Vaguery

We show how the success of deep learning could depend not only on mathematics but also on physics: although well-known mathematical theorems guarantee that neural networks can approximate arbitrary functions well, the class of functions of practical interest can frequently be approximated through "cheap learning" with exponentially fewer parameters than generic ones. We explore how properties frequently encountered in physics such as symmetry, locality, compositionality, and polynomial log-probability translate into exceptionally simple neural networks. We further argue that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine-learning, a deep neural network can be more efficient than a shallow one. We formalize these claims using information theory and discuss the relation to the renormalization group. We prove various "no-flattening theorems" showing when efficient linear deep networks cannot be accurately approximated by shallow ones without efficiency loss, for example, we show that n variables cannot be multiplied using fewer than 2^n neurons in a single hidden layer.

deep-learning
machine-learning
approximation
looking-to-see
algorithms
representation
rather-interesting
learning-algorithms
information-theory
to-write-about
october 2017 by Vaguery

[1608.08087] Equisectional equivalence of triangles

october 2017 by Vaguery

We study equivalence relation of the set of triangles generated by similarity and operation on a triangle to get a new one by joining division points of three edges with the same ratio. Using the moduli space of similarity classes of triangles introduced by Nakamura and Oguiso, we give characterization of equivalent triangles in terms of circles of Apollonius (or hyperbolic pencil of circles) and properties of special equivalent triangles. We also study rationality problem and constructibility problem.

plane-geometry
compass-and-straightedge
looking-to-see
rather-interesting
algebra
nudge-targets
consider:feature-discovery
october 2017 by Vaguery

[1412.1913] A Portfolio Approach to Algorithm Selection for Discrete Time-Cost Trade-off Problem

september 2017 by Vaguery

It is a known fact that the performance of optimization algorithms for NP-Hard problems vary from instance to instance. We observed the same trend when we comprehensively studied multi-objective evolutionary algorithms (MOEAs) on a six benchmark instances of discrete time-cost trade-off problem (DTCTP) in a construction project. In this paper, instead of using a single algorithm to solve DTCTP, we use a portfolio approach that takes multiple algorithms as its constituent. We proposed portfolio comprising of four MOEAs, Non-dominated Sorting Genetic Algorithm II (NSGA-II), the strength Pareto Evolutionary Algorithm II (SPEA-II), Pareto archive evolutionary strategy (PAES) and Niched Pareto Genetic Algorithm II (NPGA-II) to solve DTCTP. The result shows that the portfolio approach is computationally fast and qualitatively superior to its constituent algorithms for all benchmark instances. Moreover, portfolio approach provides an insight in selecting the best algorithm for all benchmark instances of DTCTP.

multiobjective-optimization
benchmarking
trade-offs
looking-to-see
computational-complexity
define-your-terms
rather-interesting
to-write-about
september 2017 by Vaguery

the power of doing things just for fun – Navigating Complexity

september 2017 by Vaguery

I recently watched a documentary called The Pleasure of Finding Things Out (1981) which focussed on the life of Richard Feynman, a physicist who has widely been called a scientific genius. I was delighted to hear him describe how valuable ‘doing things just for fun’ was to his work. He told of how, at a certain moment, he let go of trying to make his work ‘useful’ and started doing things ‘just for the fun of it’. It was then that the insights started to flow which eventually led him to winning the Nobel Prize.

via:?
ludic-impulse
looking-to-see
cultural-norms
philosophy-of-science
philosophy
psychology
social-norms
september 2017 by Vaguery

[1708.03046] When Does the First Spurious Variable Get Selected by Sequential Regression Procedures?

august 2017 by Vaguery

Applied statisticians use sequential regression procedures to produce a ranking of explanatory variables and, in settings of low correlations between variables and strong true effect sizes, expect that variables at the very top of this ranking are true. In a regime of certain sparsity levels, however, three examples of sequential procedures---forward stepwise, the lasso, and least angle regression---are shown to include the first spurious variable unexpectedly early. We derive a rigorous, sharp prediction of the rank of the first spurious variable for the three procedures, demonstrating that the first spurious variable occurs earlier and earlier as the regression coefficients get denser. This counterintuitive phenomenon persists for independent Gaussian random designs and an arbitrarily large magnitude of the true effects. We further gain a better understanding of the phenomenon by identifying the underlying cause and then leverage the insights to introduce a simple visualization tool termed the "double-ranking diagram" to improve on sequential methods.

As a byproduct of these findings, we obtain the first provable result certifying the exact equivalence between the lasso and least angle regression in the early stages of solution paths beyond orthogonal designs. This equivalence can seamlessly carry over many important model selection results concerning the lasso to least angle regression.

statistics
variable-selection
rather-interesting
looking-to-see
algorithms
to-write-about
performance-measure
feature-construction
As a byproduct of these findings, we obtain the first provable result certifying the exact equivalence between the lasso and least angle regression in the early stages of solution paths beyond orthogonal designs. This equivalence can seamlessly carry over many important model selection results concerning the lasso to least angle regression.

august 2017 by Vaguery

Heesch Numbers, Part 2: Polyforms – Isohedral

august 2017 by Vaguery

In the first post in this series, I introduced the concept of a shape’s Heesch number. In brief, if a shape doesn’t tile the plane, its Heesch number is a measure of the maximum number of times you can surround the shape with layers of copies of itself. (Shapes that do tile are defined to have a Heesch number of infinity.) Shapes with positive, finite Heesch numbers are entertaining mathematical curiosities. Far more mysterious—and infuriating—is the fact that we know of examples of Heesch numbers only up to five, and nothing higher. Learning more about shapes with high Heesch numbers could offer insights into deep unsolved problems in tiling theory.

tiling
combinatorics
mathematical-recreations
rather-interesting
number-theory
looking-to-see
nudge-targets
consider:feature-discovery
to-write-about
august 2017 by Vaguery

[1704.01565] Charging changes contact composition in binary sphere packings

august 2017 by Vaguery

Equal volume mixtures of small and large polytetrafluorethylene (PTFE) spheres are shaken in an atmosphere of controlled humidity which allows to also control their tribo-charging. We find that the contact numbers are charge-dependent: as the charge density of the beads increases, the number of same-type contacts decreases and the number of opposite-type contacts increases. This change is not caused by a global segregation of the sample. Hence, tribo-charging can be a way to tune the local composition of a granular material.

packing
condensed-matter
looking-to-see
experiment
rather-interesting
granular-materials
to-write-about
it's-more-complicated-than-you-think
august 2017 by Vaguery

[1607.00363] Using smartphone pressure sensors to measure vertical velocities of elevators, stairways, and drones

august 2017 by Vaguery

We measure the vertical velocities of elevators, pedestrians climbing stairs, and drones (flying unmanned aerial vehicles), by means of smartphone pressure sensors. The barometric pressure obtained with the smartphone is related to the altitude of the device via the hydrostatic approximation. From the altitude values, vertical velocities are derived. The approximation considered is valid in the first hundred meters of the inner layers of the atmosphere. In addition to pressure, acceleration values were also recorded using the built-in accelerometer. Numerical integration was performed, obtaining both vertical velocity and altitude. We show that data obtained using the pressure sensor is significantly less noisy than that obtained using the accelerometer. Error accumulation is also evident in the numerical integration of the acceleration values. In the proposed experiments, the pressure sensor also outperforms GPS, because this sensor does not receive satellite signals indoors and, in general, the operating frequency is considerably lower than that of the pressure sensor. In the cases in which it is possible, comparison with reference values taken from the architectural plans of buildings validates the results obtained using the pressure sensor. This proposal is ideally performed as an external or outreach activity with students to gain insight about fundamental questions in mechanics, fluids, and thermodynamics.

physics
looking-to-see
experiment
rather-interesting
want
to-write-about
to-do
august 2017 by Vaguery

[1706.04791] Obstacle-shape effect in a two-dimensional granular silo flow field

august 2017 by Vaguery

We conducted simple experiment and numerical simulation of two-dimensional granular discharge flow driven by gravity under the influence of an obstacle. According to the previous work (Zuriguel {\it et al.,\,Phys.\,Rev.\,Lett.}\,{\bf 107}: 278001, 2011), the clogging of granular discharge flow can be suppressed by putting a circular obstacle at a proper position. In order to investigate the details of obstacle effect in granular flow, we focused on particle dynamics in this study. From the experimental and numerical data, we found that the obstacle remarkably affects the horizontal-velocity distribution and packing fraction at the vicinity of the exit. In addition to the circular obstacle, we utilized triangular, inverted-triangular, and horizontal-bar obstacles to discuss the obstacle-shape effect in granular discharge flow. Based on the investigation of dynamical quantities such as velocity distributions, granular temperature, and volume fraction, we found that the triangular obstacle or horizontal bar could be very effective to prevent the clogging. From the obtained result, we consider that the detouring of particles around the obstacle and resultant low packing fraction at the exit region effectively prevent the clogging in a certain class of granular discharge flow.

granular-materials
looking-to-see
experiment
physics
rather-interesting
to-write-about
august 2017 by Vaguery

Extractor attractor – Almost looks like work

august 2017 by Vaguery

Recently the extractor fan in my bathroom has started malfunctioning, occasionally grinding and stalling. The infuriating thing is that the grinding noise isn’t perfectly periodic – it is approximately so, but there are occasionally long gaps and the short gaps vary slightly. This lack of predictability makes the noise incredibly annoying, and hard to tune out. Before getting it fixed, I decided to investigate it a bit further.

The terminally curious may listen to the sound here:

https://www.dropbox.com/s/4xh1gmrjry10eky/FanSound.ts?dl=0

This was recorded from my phone, you can also hear me puttering around in the background.

After dumping the audio data, I looked at the waveform and realised it was quite difficult to extract the temporal locations of the grinding noises from the volume alone. As a good physicist I therefore had another look in the frequency domain, making a spectrogram.

mathematical-recreations
looking-to-see
data-analysis
visualization
physics
nonlinear-dynamics
amusing
The terminally curious may listen to the sound here:

https://www.dropbox.com/s/4xh1gmrjry10eky/FanSound.ts?dl=0

This was recorded from my phone, you can also hear me puttering around in the background.

After dumping the audio data, I looked at the waveform and realised it was quite difficult to extract the temporal locations of the grinding noises from the volume alone. As a good physicist I therefore had another look in the frequency domain, making a spectrogram.

august 2017 by Vaguery

Beyond subjective and objective in statistics [PDF]

july 2017 by Vaguery

Decisions in statistical data analysis are often justified, criticized, or avoided using concepts of objectivity and subjectivity. We argue that the words “objective” and “subjective” in statis- tics discourse are used in a mostly unhelpful way, and we propose to replace each of them with broader collections of attributes, with objectivity replaced by transparency, consensus, im- partiality, and correspondence to observable reality, and subjectivity replaced by awareness of multiple perspectives and context dependence. Together with stability, these make up a collection of virtues that we think is helpful in discussions of statistical foundations and practice. The advantage of these reformulations is that the replacement terms do not oppose each other and that they give more specific guidance about what statistical science strives to achieve. Instead of debating over whether a given statistical method is subjective or objective (or normatively debating the relative merits of subjectivity and objectivity in statistical practice), we can rec- ognize desirable attributes such as transparency and acknowledgment of multiple perspectives as complementary goals. We demonstrate the implications of our proposal with recent applied examples from pharmacology, election polling, and socioeconomic stratification. The aim of this paper is to push users and developers of statistical methods toward more effective use of diverse sources of information and more open acknowledgement of assumptions and goals.

statistics
philosophy-of-science
data-analysis
looking-to-see
hypothesis-testing
learning
to-read
july 2017 by Vaguery

[1706.08224] Do GANs actually learn the distribution? An empirical study

july 2017 by Vaguery

Do GANS (Generative Adversarial Nets) actually learn the target distribution? The foundational paper of (Goodfellow et al 2014) suggested they do, if they were given sufficiently large deep nets, sample size, and computation time. A recent theoretical analysis in Arora et al (to appear at ICML 2017) raised doubts whether the same holds when discriminator has finite size. It showed that the training objective can approach its optimum value even if the generated distribution has very low support ---in other words, the training objective is unable to prevent mode collapse. The current note reports experiments suggesting that such problems are not merely theoretical. It presents empirical evidence that well-known GANs approaches do learn distributions of fairly low support, and thus presumably are not learning the target distribution. The main technical contribution is a new proposed test, based upon the famous birthday paradox, for estimating the support size of the generated distribution.

deep-learning
neural-networks
looking-to-see
probability-theory
algorithms
experiment
rather-interesting
to-write-about
july 2017 by Vaguery

The zombie robot argument lurches on: There is no evidence that automation leads to joblessness or inequality | Economic Policy Institute

june 2017 by Vaguery

What is remarkable about this media narrative is that there is a strong desire to believe it despite so little evidence to support these claims. There clearly are serious problems in the labor market that have suppressed job and wage growth for far too long; but these problems have their roots in intentional policy decisions regarding globalization, collective bargaining, labor standards, and unemployment levels, not technology.

This report highlights the paucity of the evidence behind the alleged robot apocalypse, particularly as mischaracterized in the media coverage of the 2017 Acemoglu and Restrepo (A&R) report. Yes, automation has led to job displacements in particular occupations and industries in the past, but there is no basis for claiming that automation has led—or will lead—to increased joblessness, unemployment, or wage stagnation overall. We argue that the current excessive media attention to robots and automation destroying the jobs of the past and leaving us jobless in the future is a distraction from the main issues that need to be addressed: the poor wage growth and inequality caused by policies that have shifted economic power away from low- and moderate-wage workers. It is also the case that, as Atkinson and Wu (2017) argue, our productivity growth is too low, not too high.

Rather than debating possible problems that are more than a decade way, policymakers need to focus on addressing the decades-long crisis of wage stagnation by creating good jobs and supporting wage growth. And as it turns out, policies to expand good jobs and increase wages are the same measures needed to ensure that workers potentially displaced by automation have good jobs to transition to. For these workers, the education and training touted as solutions in the mainstream robot narrative will be inadequate, just as they were not adequate to help displaced manufacturing workers over the last few decades.

public-policy
looking-to-see
economics
automation
This report highlights the paucity of the evidence behind the alleged robot apocalypse, particularly as mischaracterized in the media coverage of the 2017 Acemoglu and Restrepo (A&R) report. Yes, automation has led to job displacements in particular occupations and industries in the past, but there is no basis for claiming that automation has led—or will lead—to increased joblessness, unemployment, or wage stagnation overall. We argue that the current excessive media attention to robots and automation destroying the jobs of the past and leaving us jobless in the future is a distraction from the main issues that need to be addressed: the poor wage growth and inequality caused by policies that have shifted economic power away from low- and moderate-wage workers. It is also the case that, as Atkinson and Wu (2017) argue, our productivity growth is too low, not too high.

Rather than debating possible problems that are more than a decade way, policymakers need to focus on addressing the decades-long crisis of wage stagnation by creating good jobs and supporting wage growth. And as it turns out, policies to expand good jobs and increase wages are the same measures needed to ensure that workers potentially displaced by automation have good jobs to transition to. For these workers, the education and training touted as solutions in the mainstream robot narrative will be inadequate, just as they were not adequate to help displaced manufacturing workers over the last few decades.

june 2017 by Vaguery

[1606.03686] Does Having More Options Mean Harder to Reach Consensus?

may 2017 by Vaguery

We generalize a binary majority-vote model on adaptive networks to a plurality-vote counterpart. When opinions are uniformly distributed in the population of voters in the initial state, it is found that having more available opinions in the initial state actually accelerate the time to consensus. In particular, we investigate the three-state plurality-vote model. While time to consensus in two state model scales exponentially with population size N, for finite-size system, there is a non-zero probability that either the population reaches the consensus state in a time that is very short and independent of N (in the heterophily regime), or in a time that scales exponentially with N but is still much faster than two-state model.

voting-models
agent-based
evolutionary-economics
diversity
rather-interesting
to-write-about
looking-to-see
may 2017 by Vaguery

[1409.4178] The frustrated brain: From dynamics on motifs to communities and networks

may 2017 by Vaguery

Cognitive function depends on an adaptive balance between flexible dynamics and integrative processes in distributed cortical networks. Patterns of zero-lag synchrony likely underpin numerous perceptual and cognitive functions. Synchronization fulfils integration by reducing entropy, whilst adaptive function mandates that a broad variety of stable states be readily accessible. Here, we elucidate two complementary influences on patterns of zero-lag synchrony that derive from basic properties of brain networks. First, mutually coupled pairs of neuronal subsystems -- resonance pairs -- promote stable zero-lag synchrony amongst the small motifs in which they are embedded, and whose effects can propagate along connected chains. Second, frustrated closed-loop motifs disrupt synchronous dynamics, enabling metastable configurations of zero-lag synchrony to coexist. We document these two complementary influences in small motifs and illustrate how these effects underpin stable versus metastable phase-synchronization patterns in prototypical modular networks and in large-scale cortical networks of the macaque (CoCoMac). We find that the variability of synchronization patterns depends on the inter-node time delay, increases with the network size, and is maximized for intermediate coupling strengths. We hypothesize that the dialectic influences of resonance versus frustration may form a dynamic substrate for flexible neuronal integration, an essential platform across diverse cognitive processes.

dynamical-systems
network-theory
coupled-oscillators
emergent-design
looking-to-see
systems-biology
nudge-targets
consider:robustness
consider:looking-to-see
may 2017 by Vaguery

[1304.5008] Mechanisms of Zero-Lag Synchronization in Cortical Motifs

may 2017 by Vaguery

Zero-lag synchronization between distant cortical areas has been observed in a diversity of experimental data sets and between many different regions of the brain. Several computational mechanisms have been proposed to account for such isochronous synchronization in the presence of long conduction delays: Of these, the phenomenon of "dynamical relaying" - a mechanism that relies on a specific network motif - has proven to be the most robust with respect to parameter mismatch and system noise. Surprisingly, despite a contrary belief in the community, the common driving motif is an unreliable means of establishing zero-lag synchrony. Although dynamical relaying has been validated in empirical and computational studies, the deeper dynamical mechanisms and comparison to dynamics on other motifs is lacking. By systematically comparing synchronization on a variety of small motifs, we establish that the presence of a single reciprocally connected pair - a "resonance pair" - plays a crucial role in disambiguating those motifs that foster zero-lag synchrony in the presence of conduction delays (such as dynamical relaying) from those that do not (such as the common driving triad). Remarkably, minor structural changes to the common driving motif that incorporate a reciprocal pair recover robust zero-lag synchrony. The findings are observed in computational models of spiking neurons, populations of spiking neurons and neural mass models, and arise whether the oscillatory systems are periodic, chaotic, noise-free or driven by stochastic inputs. The influence of the resonance pair is also robust to parameter mismatch and asymmetrical time delays amongst the elements of the motif. We call this manner of facilitating zero-lag synchrony resonance-induced synchronization, outline the conditions for its occurrence, and propose that it may be a general mechanism to promote zero-lag synchrony in the brain.

systems-biology
dynamical-systems
coupled-oscillators
engineering-design
emergent-design
looking-to-see
nudge-targets
consider:looking-to-see
consider:robustness
consider:reQ-like-systems
may 2017 by Vaguery

[1704.06304] k-Majority Digraphs and the Hardness of Voting with a Constant Number of Voters

may 2017 by Vaguery

Many hardness results in computational social choice make use of the fact that every directed graph may be induced as the pairwise majority relation of some preference profile. However, this fact requires a number of voters that is almost linear in the number of alternatives. It is therefore unclear whether these results remain intact when the number of voters is bounded, as is, for example, typically the case in search engine aggregation settings. In this paper, we provide a systematic study of majority digraphs for a constant number of voters resulting in analytical, experimental, and complexity-theoretic insights. First, we characterize the set of digraphs that can be induced by two and three voters, respectively, and give sufficient conditions for larger numbers of voters. Second, we present a surprisingly efficient implementation via SAT solving for computing the minimal number of voters that is required to induce a given digraph and experimentally evaluate how many voters are required to induce the majority digraphs of real-world and generated preference profiles. Finally, we leverage our sufficient conditions to show that the winner determination problem of various well-known voting rules remains hard even when there is only a small constant number of voters. In particular, we show that Kemeny's rule is hard to evaluate for 7 voters, while previous methods could only establish such a result for constant even numbers of voters.

game-theory
yeah-but-what-if-papers
existence-is-not-expectation
rather-interesting
looking-to-see
probability-theory
nudge-targets
consider:stress-testing
graph-theory
Arrow's-theorem-too
to-write-about
may 2017 by Vaguery

[1704.08798] Word Affect Intensities

may 2017 by Vaguery

Words often convey affect -- emotions, feelings, and attitudes. Lexicons of word-affect association have applications in automatic emotion analysis and natural language generation. However, existing lexicons indicate only coarse categories of affect association. Here, for the first time, we create an affect intensity lexicon with real-valued scores of association. We use a technique called best-worst scaling that improves annotation consistency and obtains reliable fine-grained scores. The lexicon includes terms common from both general English and terms specific to social media communications. It has close to 6,000 entries for four basic emotions. We will be adding entries for other affect dimensions shortly.

natural-language-processing
looking-to-see
affect
rather-interesting
to-write-about
consider:feature-discovery
may 2017 by Vaguery

[1703.01143] Why is it hard to beat $O(n^2)$ for Longest Common Weakly Increasing Subsequence?

may 2017 by Vaguery

The Longest Common Weakly Increasing Subsequence problem (LCWIS) is a variant of the classic Longest Common Subsequence problem (LCS). Both problems can be solved with simple quadratic time algorithms. A recent line of research led to a number of matching conditional lower bounds for LCS and other related problems. However, the status of LCWIS remained open.

In this paper we show that LCWIS cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis (SETH) is false.

The ideas which we developed can also be used to obtain a lower bound based on a safer assumption of NC-SETH, i.e. a version of SETH which talks about NC circuits instead of less expressive CNF formulas.

computational-complexity
robustness
algorithms
looking-to-see
rather-interesting
to-write-about
extreme-values
outliers
hard-problems
feature-construction
nudge-targets
consider:feature-discovery
In this paper we show that LCWIS cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis (SETH) is false.

The ideas which we developed can also be used to obtain a lower bound based on a safer assumption of NC-SETH, i.e. a version of SETH which talks about NC circuits instead of less expressive CNF formulas.

may 2017 by Vaguery

[1307.0587] Self-Assembly on a Cylinder: A Model System for Understanding the Constraint of Commensurability

april 2017 by Vaguery

A crystal lattice, when confined to the surface of a cylinder, must have a periodic structure that is commensurate with the cylinder circumference. This constraint can frustrate the system, leading to oblique crystal lattices or to structures with a chiral seam known as a "line slip" phase, neither of which are stable for isotropic particles in equilibrium on flat surfaces. In this study, we use molecular dynamics simulations to find the steady-state structure of spherical particles with short-range repulsion and long-range attraction far below the melting temperature. We vary the range of attraction using the Lennard-Jones and Morse potentials and find that a shorter-range attraction favors the line-slip. We develop a simple model based only on geometry and bond energy to predict when the crystal or line-slip phases should appear, and find reasonable agreement with the simulations. The simplicity of this model allows us to understand the influence of the commensurability constraint, an understanding that might be extended into the more general problem of self-assembling particles in strongly confined spaces.

packing
tiling
approximation
looking-to-see
out-of-the-box
physics!
simulation
frustrated-systems
self-organization
april 2017 by Vaguery

[1008.4633] Carryless Arithmetic Mod 10

april 2017 by Vaguery

We investigate what arithmetic would look like if carry digits into other digit position were ignored, so that 9 + 4 = 3, 5 + 5 = 0, 9 X 4 = 6, 5 X 4 = 0, and so on. For example, the primes are now 21, 23, 25, 27, 29, 41, 43, 45, 47, ... .

number-theory
mathematical-recreations
looking-to-see
Martin-Gardner
nudge-targets
consider:looking-to-see
rather-interesting
to-write-about
april 2017 by Vaguery

[0805.2128] Eight Hateful Sequences

april 2017 by Vaguery

In his July 1974 Scientific American column, Martin Gardner mentioned the Handbook of Integer Sequences, which then contained 2372 sequences. Today the On-Line Encyclopedia of Integer Sequences (the OEIS) contains 140000 sequences. This paper discusses eight of them, suggested by the theme of the Eighth Gathering For Gardner: they are all infinite, and all 'ateful in one way or another. Each one is connected with an unsolved problem. The sequences are related to: hateful numbers, Angelini's 1995 puzzle, the persistence of a number, Alekseyev's 123 sequence, the curling number conjecture, Quet's prime-generating recurrence, the traveling salesman's problem, and the Riemann Hypothesis.

number-theory
Martin-Gardner
mathematical-recreations
sequences
looking-to-see
nudge-targets
consider:looking-to-see
april 2017 by Vaguery

[math/0305308] Numerical Analogues of Aronson's Sequence

april 2017 by Vaguery

Aronson's sequence 1, 4, 11, 16, ... is defined by the English sentence ``t is the first, fourth, eleventh, sixteenth, ... letter of this sentence.'' This paper introduces some numerical analogues, such as: a(n) is taken to be the smallest positive integer greater than a(n-1) which is consistent with the condition ``n is a member of the sequence if and only if a(n) is odd.'' This sequence can also be characterized by its ``square'', the sequence a^(2)(n) = a(a(n)), which equals 2n+3 for n >= 1. There are many generalizations of this sequence, some of which are new, while others throw new light on previously known sequences.

number-theory
sequences
looking-to-see
define-your-terms
nudge-targets
consider:feature-discovery
consider:performance-measures
april 2017 by Vaguery

[math/0505295] Sloping Binary Numbers: A New Sequence Related to the Binary Numbers

april 2017 by Vaguery

If the list of binary numbers is read by upward-sloping diagonals, the resulting ``sloping binary numbers'' 0, 11, 110, 101, 100, 1111, 1010, ... (or 0, 3, 6, 5, 4, 15, 10, ...) have some surprising properties. We give formulae for the n-th term and the n-th missing term, and discuss a number of related sequences.

number-theory
looking-to-see
to-write-about
nudge-targets
consider:feature-discovery
april 2017 by Vaguery

[1106.5531] Balls in Boxes: Variations on a Theme of Warren Ewens and Herbert Wilf

april 2017 by Vaguery

We comment on, elaborate, and extend the work of Warren Ewens and Herbert Wilf, described in their this http URL about the maximum in balls-and-boxes problem. In particular we meta-apply their ingenious method to show that it is not really needed, and that one is better off using the so-called Poisson Approximation, at least in applications to the real world, because extremely unlikely events mever happen in real life. This article is accompanied by the Maple package this http URL">BallsInBoxes.

probability-theory
statistics
hey-I-know-this-guy
combinatorics
looking-to-see
define-your-terms
to-write-about
mathematical-recreations
nudge-targets
consider:rediscovery
april 2017 by Vaguery

[1304.1226] On Euler's "Misleading Induction", Andrews' "Fix", and How to Fully Automate them

april 2017 by Vaguery

One of the greatest experimental mathematicians of all time was also one of the greatest mathematicians of all time, the great Leonhard Euler. Usually he had an uncanny intuition on how many "special cases" one needs before one can formulate a plausible conjecture, but one time he was "almost fooled", only to find out that his conjecture was premature.

In 1990, George Andrews found a way to "correct" Euler. Here we show how to generate, AUTOMATICALLY, rigorously-proved Euler-Andrews Style formulas, that enables one to generate Euler-style "cautionary tales" about the "danger" of using naive empirical induction. Ironically, the way we prove the Andrews-style corrections is empirical! But in order to turn the empirical proof into a full-fledged rigorous proof, we must make sure that we check sufficiently many (but still not that many!) special cases.

combinatorics
looking-to-see
induction
philosophy-of-science
pattern-discovery
nudge-targets
consider:looking-to-see
to-write-about
mathematical-recreations
In 1990, George Andrews found a way to "correct" Euler. Here we show how to generate, AUTOMATICALLY, rigorously-proved Euler-Andrews Style formulas, that enables one to generate Euler-style "cautionary tales" about the "danger" of using naive empirical induction. Ironically, the way we prove the Andrews-style corrections is empirical! But in order to turn the empirical proof into a full-fledged rigorous proof, we must make sure that we check sufficiently many (but still not that many!) special cases.

april 2017 by Vaguery

[1502.04377] The Method(!) of "Guess and Check"

april 2017 by Vaguery

The problems of enumerating lattice walks, with an arbitrary finite set of allowed steps, both in one and two dimensions, where one must always stay in the non-negative half-line and quarter-plane respectively, are used, as case studies, to illustrate the `naive' methodology of guess-and-check, where rigorous proofs are possible, but not worth the trouble. We argue that this is a metaphor for future math.

philosophy-of-science
mathematical-recreations
rather-interesting
to-write-about
looking-to-see
april 2017 by Vaguery

[1703.04792] Frustration and thermalisation in an artificial magnetic quasicrystal

april 2017 by Vaguery

We have created and studied artificial magnetic quasicrystals based on Penrose tiling patterns of interacting nanomagnets that lack the translational symmetry of spatially periodic artificial spin ices. Vertex-level degeneracy and frustration induced by the network topology of the Penrose pattern leads to a low energy configuration that we propose as a ground state. It has two parts, a quasi-one-dimensional rigid "skeleton" that spans the entire pattern and is capable of long-range order, and clusters of macrospins within it that are degenerate in a nearest neighbour model, and so are "flippable". These lead to macroscopic degeneracy for the array as a whole. Magnetic force microscopy imaging of Penrose tiling arrays revealed superdomains that are larger for more strongly coupled arrays. The superdomain size is larger after AC-demagnetisation and especially after annealing the array above its blocking temperature.

experiment
frustrated-networks
complexology
quasicrystals
looking-to-see
rather-interesting
april 2017 by Vaguery

[1704.00828] A Probabilistic Linear Genetic Programming with Stochastic Context-Free Grammar for solving Symbolic Regression problems

april 2017 by Vaguery

Traditional Linear Genetic Programming (LGP) algorithms are based only on the selection mechanism to guide the search. Genetic operators combine or mutate random portions of the individuals, without knowing if the result will lead to a fitter individual. Probabilistic Model Building Genetic Programming (PMB-GP) methods were proposed to overcome this issue through a probability model that captures the structure of the fit individuals and use it to sample new individuals. This work proposes the use of LGP with a Stochastic Context-Free Grammar (SCFG), that has a probability distribution that is updated according to selected individuals. We proposed a method for adapting the grammar into the linear representation of LGP. Tests performed with the proposed probabilistic method, and with two hybrid approaches, on several symbolic regression benchmark problems show that the results are statistically better than the obtained by the traditional LGP.

genetic-programming
search-operators
generative-models
looking-to-see
rather-interesting
algorithms
metaheuristics
april 2017 by Vaguery

The GRIM test — a method for evaluating published research. – Medium

march 2017 by Vaguery

Ever had one of those ideas where you thought: “No, this is too simple, someone must have already thought of it.”

And then found that no-one had?

And that it was a good idea after all?

Well, that’s what happened to us.

(Who is us? I’m going to use pronouns messily through the following almost-3000 words, but let the record show: ‘us’ and ‘we’ is Nick Brown and myself.)

The pre-print of this paper is HERE — “The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology”.

via:arthegall
statistics
looking-to-see
inverse-problems
rather-interesting
academic-culture
publishing
science
peer-review
And then found that no-one had?

And that it was a good idea after all?

Well, that’s what happened to us.

(Who is us? I’m going to use pronouns messily through the following almost-3000 words, but let the record show: ‘us’ and ‘we’ is Nick Brown and myself.)

The pre-print of this paper is HERE — “The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology”.

march 2017 by Vaguery

[1310.5971] Jammed lattice sphere packings

march 2017 by Vaguery

We generate and study an ensemble of isostatic jammed hard-sphere lattices. These lattices are obtained by compression of a periodic system with an adaptive unit cell containing a single sphere until the point of mechanical stability. We present detailed numerical data about the densities, pair correlations, force distributions, and structure factors of such lattices. We show that this model retains many of the crucial structural features of the classical hard-sphere model and propose it as a model for the jamming and glass transitions that enables exploration of much higher dimensions than are usually accessible.

statistical-mechanics
physics!
packing
simulation
looking-to-see
march 2017 by Vaguery

You Can’t Break Math – dy/dan

february 2017 by Vaguery

BTW

I haven’t found a way to generate these kinds of insights about math without surrounding myself with people learning math for the first time.

One of my most enduring shortcomings as a teacher is how much I plan and revise those plans, even if the lesson I have on file will suffice. I’ll chase a scintilla of an improvement for hours, which was never sustainable. I spent most of the previous day prepping this Desmos activity. We used 10% of it.

Language from the day that I’m still pondering: “We cancel the 2x’s because we want to get x by itself.” I’m trying to decide if those italicized expressions contribute to a student’s understanding of large ideas about mathematics or of small ideas about solving a particular kind of equation.

pedagogy
the-mangle-in-practice
learning-by-doing
looking-to-see
rather-interesting
mathematics
to-write-about
I haven’t found a way to generate these kinds of insights about math without surrounding myself with people learning math for the first time.

One of my most enduring shortcomings as a teacher is how much I plan and revise those plans, even if the lesson I have on file will suffice. I’ll chase a scintilla of an improvement for hours, which was never sustainable. I spent most of the previous day prepping this Desmos activity. We used 10% of it.

Language from the day that I’m still pondering: “We cancel the 2x’s because we want to get x by itself.” I’m trying to decide if those italicized expressions contribute to a student’s understanding of large ideas about mathematics or of small ideas about solving a particular kind of equation.

february 2017 by Vaguery

[1604.04722] Where is the global corporate elite? A large-scale network study of local and nonlocal interlocking directorates

february 2017 by Vaguery

Business elites reconfigure their locus of organization over time, from the city level, to the national level, and beyond. We ask what the current level of elite organization is and propose a novel theoretical and empirical approach to answer this question. Building on the universal distinction between local and nonlocal ties we use network analysis and community detection to dissect the global network of interlocking directorates among over five million firms. We find that elite orientation is indeed changing from the national to the transnational plane, but we register a considerable heterogeneity across different regions in the world. In some regions the business communities are organized along national borders, whereas in other areas the locus of organization is at the city level or international level. London dominates the global corporate elite network. Our findings underscore that the study of corporate elites requires an approach that is sensitive to levels of organization that go beyond the confines of nation states.

social-networks
rather-interesting
corporatism
globalism
looking-to-see
to-write-about
february 2017 by Vaguery

[1701.01329] Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks

february 2017 by Vaguery

In de novo drug design, computational strategies are used to generate novel molecules with good affinity to the desired biological target. In this work, we show that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated molecules correlate very well with the properties of the molecules used to train the model. In order to enrich libraries with molecules active towards a given biological target, we propose to fine-tune the model with small sets of molecules, which are known to be active against that target.

Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test molecules that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria) it reproduced 28% of 1240 test molecules. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery.

pharmaceutical
engineering-design
machine-learning
neural-networks
feature-construction
nudge-targets
consider:representation
consider:looking-to-see
looking-to-see
cheminformatics
party-like-its-1999
Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test molecules that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria) it reproduced 28% of 1240 test molecules. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery.

february 2017 by Vaguery

[1606.03225] In a search for a shape maximizing packing fraction for two-dimensional random sequential adsorption

december 2016 by Vaguery

Random sequential adsorption (RSA) of various two dimensional objects is studied in order to find a shape which maximizes the saturated packing fraction. This investigation was begun in our previous paper [Cie\'sla et al., Phys. Chem. Chem. Phys. 17, 24376 (2015)], where the densest packing was studied for smoothed dimers. Here this shape is compared with a smoothed n-mers, spherocylinders and ellipses. It is found that the highest packing fraction out of the studied shapes is 0.58405±0.0001 and is obtained for ellipses having long-to-short axis ratio of 1.85, which is also the largest anisotropy among the investigated shapes.

packing
simulation
looking-to-see
granular-materials
rather-interesting
phase-transitions
nudge-targets
consider:looking-to-see
consider:symbolic-regression
consider:robustness
december 2016 by Vaguery

[1507.03323] Boolean Gossiping Networks

november 2016 by Vaguery

This paper proposes and investigates a Boolean gossiping model as a simplified but non-trivial probabilistic Boolean network. With positive node interactions, in view of standard theories from Markov chains, we prove that the node states asymptotically converge to an agreement at a binary random variable, whose distribution is characterized for large-scale networks by mean-field approximation. Using combinatorial analysis, we also successfully count the number of communication classes of the positive Boolean network explicitly in terms of the topology of the underlying interaction graph, where remarkably minor variation in local structures can drastically change the number of network communication classes. With general Boolean interaction rules, emergence of absorbing network Boolean dynamics is shown to be determined by the network structure with necessary and sufficient conditions established regarding when the Boolean gossiping process defines absorbing Markov chains. These results illustrate possibilities of relating detailed dynamical properties of Boolean networks to graphical properties of the underlying interactions.

boolean-networks
Kauffmania
network-theory
systems-biology
looking-to-see
rather-interesting
dynamical-systems
nudge-targets
consider:feature-discovery
consider:approximation
november 2016 by Vaguery

Image Synthesis from Yahoo's open_nsfw

october 2016 by Vaguery

I remember proposing almost this exact conjecture to Cosma Shalizi and D. Eric Smith, sitting in the garden at SFI in about 2000. It makes me happy to see it coming into existence. I certainly never got around to it.

image-processing
generative-art
generative-models
deep-learning
aesthetics
looking-to-see
the-posthuman-aesthetic
october 2016 by Vaguery

[1605.06396] Soft Covering with High Probability

july 2016 by Vaguery

Wyner's soft-covering lemma is the central analysis step for achievability proofs of information theoretic security, resolvability, and channel synthesis. It can also be used for simple achievability proofs in lossy source coding. This work sharpens the claim of soft-covering by moving away from an expected value analysis. Instead, a random codebook is shown to achieve the soft-covering phenomenon with high probability. The probability of failure is super-exponentially small in the block-length, enabling many applications through the union bound. This work gives bounds for both the exponential decay rate of total variation and the second-order codebook rate for soft covering.

information-theory
looking-to-see
rather-interesting
heuristics
approximation
nudge-targets
consider:looking-to-see
consider:feature-discovery
july 2016 by Vaguery

[1606.07262] On the Theoretical Capacity of Evolution Strategies to Statistically Learn the Landscape Hessian

july 2016 by Vaguery

We study the theoretical capacity to statistically learn local landscape information by Evolution Strategies (ESs). Specifically, we investigate the covariance matrix when constructed by ESs operating with the selection operator alone. We model continuous generation of candidate solutions about quadratic basins of attraction, with deterministic selection of the decision vectors that minimize the objective function values. Our goal is to rigorously show that accumulation of winning individuals carries the potential to reveal valuable information about the search landscape, e.g., as already practically utilized by derandomized ES variants. We first show that the statistically-constructed covariance matrix over such winning decision vectors shares the same eigenvectors with the Hessian matrix about the optimum. We then provide an analytic approximation of this covariance matrix for a non-elitist multi-child (1,λ)-strategy, which holds for a large population size λ. Finally, we also numerically corroborate our results.

evolutionary-algorithms
looking-to-see
machine-learning
optimization
formalization
july 2016 by Vaguery

[1604.08658] Dependence between External Path-Length and Size in Random Tries

june 2016 by Vaguery

We study the size and the external path length of random tries and show that they are asymptotically independent in the asymmetric case but strongly dependent with small periodic fluctuations in the symmetric case. Such an unexpected behavior is in sharp contrast to the previously known results that the internal path length is totally positively correlated to the size and that both tend to the same normal limit law. These two examples provide concrete instances of bivariate normal distributions (as limit laws) whose correlation is 0, 1 and periodically oscillating.

probability-theory
data-structures
looking-to-see
computer-science
pattern-formation
rather-odd
nudge-targets
consider:looking-to-see
june 2016 by Vaguery

Could a neuroscientist understand a microprocessor? | bioRxiv

may 2016 by Vaguery

There is a popular belief in neuroscience that we are primarily data limited, that producing large, multimodal, and complex datasets will, enabled by data analysis algorithms, lead to fundamental insights into the way the brain processes information. Microprocessors are among those artificial information processing systems that are both complex and that we understand at all levels, from the overall logical flow, via logical gates, to the dynamics of transistors. Here we take a simulated classical microprocessor as a model organism, and use our ability to perform arbitrary experiments on it to see if popular data analysis methods from neuroscience can elucidate the way it processes information. We show that the approaches reveal interesting structure in the data but do not meaningfully describe the hierarchy of information processing in the processor. This suggests that current approaches in neuroscience may fall short of producing meaningful models of the brain.

via:numerous
neural-networks
inference
modeling
experiment
rather-interesting
big-data
bioinformatics
looking-to-see
may 2016 by Vaguery

[1605.03390] Variance of the Internal Profile in Suffix Trees

may 2016 by Vaguery

The precise analysis of the variance of the profile of a suffix tree has been a longstanding open problem. We analyze three regimes of the asymptotic growth of the variance of the profile of a suffix tree built from a randomly generated binary string, in the nonuniform case. We utilize combinatorics on words, singularity analysis, and the Mellin transform.

data-structures
algorithms
computer-science
combinatorics
looking-to-see
nudge-targets
consider:feature-discovery
may 2016 by Vaguery

[1201.6035] How Accurate is inv(A)*b?

may 2016 by Vaguery

Several widely-used textbooks lead the reader to believe that solving a linear system of equations Ax = b by multiplying the vector b by a computed inverse inv(A) is inaccurate. Virtually all other textbooks on numerical analysis and numerical linear algebra advise against using computed inverses without stating whether this is accurate or not. In fact, under reasonable assumptions on how the inverse is computed, x = inv(A)*b is as accurate as the solution computed by the best backward-stable solvers. This fact is not new, but obviously obscure. We review the literature on the accuracy of this computation and present a self-contained numerical analysis of it.

numerical-methods
looking-to-see
approximation
rather-interesting
error
algorithms
nudge-targets
consider:looking-to-see
consider:performance-measures
may 2016 by Vaguery

[1601.04273] Statistical-mechanical Analysis of Linear Programming Relaxation for Combinatorial Optimization Problems

april 2016 by Vaguery

Typical behavior of the linear programming (LP) problem is studied as a relaxation of the minimum vertex cover, a type of integer programming (IP) problem. A lattice-gas model on the Erd\"os-R\'enyi random graphs of α-uniform hyperedges is proposed to express both the LP and IP problems of the min-VC in the common statistical-mechanical model with a one-parameter family. Statistical-mechanical analyses reveal for α=2 that the LP optimal solution is typically equal to that given by the IP below the critical average degree c=e in the thermodynamic limit. The critical threshold for good accuracy of the relaxation extends the mathematical result c=1, and coincides with the replica symmetry-breaking threshold of the IP. The LP relaxation for the minimum hitting sets with α≥3, minimum vertex covers on α-uniform random graphs, is also studied. Analytic and numerical results strongly suggest that the LP relaxation fails to estimate optimal values above the critical average degree c=e/(α−1) where the replica symmetry is broken.

linear-programming
relaxation
approximation
looking-to-see
rather-interesting
consider:Nk-models
nudge-targets
consider:feature-discovery
april 2016 by Vaguery

[1509.03327] Optimal Strategy in "Guess Who?": Beyond Binary Search

february 2016 by Vaguery

"Guess Who?" is a popular two player game where players ask "Yes"/"No" questions to search for their opponent's secret identity from a pool of possible candidates. This is modeled as a simple stochastic game. Using this model, the optimal strategy is explicitly found. Contrary to popular belief, performing a binary search is \emph{not} always optimal. Instead, the optimal strategy for the player who trails is to make certain bold plays in an attempt catch up. This is discovered by first analyzing a continuous version of the game where players play indefinitely and the winner is never decided after finitely many rounds.

game-theory
probability-theory
planning
looking-to-see
nudge-targets
consider:strategies
february 2016 by Vaguery

[1601.03420] Critical fluctuations in proteins native states

february 2016 by Vaguery

We study a large data set of protein structure ensembles of very diverse sizes determined by nuclear magnetic resonance. By examining the distance-dependent correlations in the displacement of residues pairs and conducting finite size scaling analysis it was found that the correlations and susceptibility behave as in systems near a critical point implying that, at the native state, the motion of each amino acid residue is felt by every other residue up to the size of the protein molecule. Furthermore certain protein's shapes corresponding to maximum susceptibility were found to be more probable than others. Overall the results suggest that the protein's native state is critical, implying that despite being posed near the minimum of the energy landscape, they still preserve their dynamic flexibility.

biophysics
protein-folding
physics
phase-transitions
self-organization
looking-to-see
nonlinear-dynamics
february 2016 by Vaguery

[1507.08245] Epistasis and the structure of fitness landscapes: are experimental fitness landscapes compatible with Fisher's model?

february 2016 by Vaguery

The fitness landscape defines the relationship between genotypes and fitness in a given environment, and underlies fundamental quantities such as the distribution of selection coefficient, or the magnitude and type of epistasis. A better understanding of variation of landscape structure across species and environments is thus necessary to understand and predict how populations adapt. An increasing number of experiments access the properties of fitness landscapes by identifying mutations, constructing genotypes with combinations of these mutations, and measuring fitness of these genotypes. Yet these empirical landscapes represent a very small sample of the vast space of all possible genotypes, and this sample is biased by the protocol used to identify mutations. Here we develop a rigorous and flexible statistical framework based on Approximate Bayesian Computation to address these concerns, and use this framework to fit a broad class of phenotypic fitness models (including Fisher's model) to 24 empirical landscapes representing 9 diverse biological systems. In spite of uncertainty due to the small size of most published empirical landscapes, the inferred landscapes have similar structure in similar biological systems. Surprisingly, goodness of fit tests reveal that this class of phenotypic models, which has been successful so far in interpreting experimental data, is a plausible model in only 3 out of 9 biological systems. In most cases, including notably the landscapes of drug resistance, Fisher's model is not able to explain the structure of empirical landscapes and patterns of epistasis.

fitness-landscapes
looking-to-see
theory-and-practice-sitting-in-a-tree
theoretical-biology
population-biology
modeling-is-not-mathematics
february 2016 by Vaguery

[1509.04316] Sums of seven octahedral numbers

december 2015 by Vaguery

We show that for a large class of cubic polynomials f, every sufficiently large number can be written as a sum of seven positive values of f. As a special case, we show that every number greater than e107 is a sum of seven positive octahedral numbers, where an octahedral number is a number of the form 2x3+x3, reducing an open problem due to Pollock to a finite computation.

number-theory
existence-proof
looking-to-see
nudge-targets
consider:looking-to-see
december 2015 by Vaguery

[1512.00908] Lattice Surfaces and smallest triangles

december 2015 by Vaguery

We calculate the area of the smallest triangle and the area of the smallest virtual triangle for many known lattice surfaces. We show that our list of the lattice surfaces for which the area of the smallest virtual triangle greater than .05 is complete. In particular, this means that there are no new lattice surfaces for which the area of the smallest virtual triangle is greater than .05. Our method follows an algorithm described by Smillie and Weiss and improves on it in certain respects.

tiling
algebra
integer-lattices
computational-geometry
looking-to-see
rather-interesting
nudge-targets
consider:feature-discovery
december 2015 by Vaguery

[1507.04381] Complete Characterization of Stability of Cluster Synchronization in Complex Dynamical Networks

december 2015 by Vaguery

Synchronization is an important and prevalent phenomenon in natural and engineered systems. In many dynamical networks, the coupling is balanced or adjusted in order to admit global synchronization, a condition called Laplacian coupling. Many networks exhibit incomplete synchronization, where two or more clusters of synchronization persist, and computational group theory has recently proved to be valuable in discovering these cluster states based upon the topology of the network. In the important case of Laplacian coupling, additional synchronization patterns can exist that would not be predicted from the group theory analysis alone. The understanding of how and when clusters form, merge, and persist is essential for understanding collective dynamics, synchronization, and failure mechanisms of complex networks such as electric power grids, distributed control networks, and autonomous swarming vehicles. We describe here a method to find and analyze all of the possible cluster synchronization patterns in a Laplacian-coupled network, by applying methods of computational group theory to dynamically-equivalent networks. We present a general technique to evaluate the stability of each of the dynamically valid cluster synchronization patterns. Our results are validated in an electro-optic experiment on a 5 node network that confirms the synchronization patterns predicted by the theory.

complexology
network-theory
coupled-oscillators
nonlinear-dynamics
structure-and-function
looking-to-see
nudge-targets
consider:looking-to-see
consider:feature-discovery
december 2015 by Vaguery

Sustained fitness gains and variability in fitness trajectories in the long-term evolution experiment with Escherichia coli | bioRxiv

september 2015 by Vaguery

Many populations live in environments subject to frequent biotic and abiotic changes. Nonetheless, it is interesting to ask whether an evolving population's mean fitness can increase indefinitely, and potentially without any limit, even in a constant environment. A recent study showed that fitness trajectories of Escherichia coli populations over 50,000 generations were better described by a power-law model than by a hyperbolic model. According to the power-law model, the rate of fitness gain declines over time but fitness has no upper limit, whereas the hyperbolic model implies a hard limit. Here, we examine whether the previously estimated power-law model predicts the fitness trajectory for an additional 10,000 generations. To that end, we conducted more than 1100 new competitive fitness assays. Consistent with the previous study, the power-law model fits the new data better than the hyperbolic model. We also analysed the variability in fitness among populations, finding subtle, but significant, heterogeneity in mean fitness. Some, but not all, of this variation reflects differences in mutation rate that evolved over time. Taken together, our results imply that both adaptation and divergence can continue indefinitely-or at least for a long time-even in a constant environment.

evolutionary-biology
experiment
hey-I-know-this-guy
biology
looking-to-see
september 2015 by Vaguery

[1311.4030] New procedures controlling the false discovery proportion via Romano-Wolf's heuristic

august 2015 by Vaguery

The false discovery proportion (FDP) is a convenient way to account for false positives when a large number m of tests are performed simultaneously. Romano and Wolf [Ann. Statist. 35 (2007) 1378-1408] have proposed a general principle that builds FDP controlling procedures from k-family-wise error rate controlling procedures while incorporating dependencies in an appropriate manner; see Korn et al. [J. Statist. Plann. Inference 124 (2004) 379-398]; Romano and Wolf (2007). However, the theoretical validity of the latter is still largely unknown. This paper provides a careful study of this heuristic: first, we extend this approach by using a notion of "bounding device" that allows us to cover a wide range of critical values, including those that adapt to m_0, the number of true null hypotheses. Second, the theoretical validity of the latter is investigated both nonasymptotically and asymptotically. Third, we introduce suitable modifications of this heuristic that provide new methods, overcoming the existing procedures with a proven FDP control.

statistics
modeling-is-not-mathematics
confidence
algorithms
looking-to-see
rather-interesting
heuristics
consider:performance-measures
august 2015 by Vaguery

Metacommunities in Dynamic Landscapes | bioRxiv

august 2015 by Vaguery

Predictions from theory, field data, and experiments have shown that high landscape connectivity promotes higher species richness than low connectivity. However, examples demonstrating high diversity in low connected landscapes also exist. Here we describe the many factors that drive landscape connectivity at different spatiotemporal scales by varying the amplitude and frequency of changes in the dispersal radius of spatial networks. We found that the fluctuations of landscape connectivity support metacommunities with higher species richness than static landscapes. Our results also show a dispersal radius threshold below which species richness drops dramatically in static landscapes. Such a threshold is not observed in dynamic landscapes for a broad range of amplitude and frequency values determining landscape connectivity. We conclude that merging amplitude and frequency as drivers of landscape connectivity together with patch dynamics into metacommunity theory can provide new testable predictions about species diversity in rapidly changing landscapes.

against-fitness-landscapes
fitness-landscapes
evolutionary-biology
Kauffmania
theoretical-biology
looking-to-see
nudge-targets
low-hanging-fruit
august 2015 by Vaguery

[1408.3600] Uncovering the nutritional landscape of food

august 2015 by Vaguery

Recent progresses in data-driven analysis methods, including network-based approaches, are revolutionizing many classical disciplines. These techniques can also be applied to food and nutrition, which must be studied to design healthy diets. Using nutritional information from over 1,000 raw foods, we systematically evaluated the nutrient composition of each food in regards to satisfying daily nutritional requirements. The nutrient balance of a food was quantified herein as nutritional fitness, using the food's frequency of occurrence in nutritionally adequate food combinations. Nutritional fitness offers prioritization of recommendable foods within a global network of foods, in which foods are connected based on the similarities of their nutrient compositions. We identified a number of key nutrients, such as choline and alpha-linolenic acid, whose levels in foods can critically affect the foods' nutritional fitness. Analogously, pairs of nutrients can have the same effect. In fact, two nutrients can impact the nutritional fitness synergistically, although the individual nutrients alone may not. This result, involving the tendency among nutrients to show correlations in their abundances across foods, implies a hidden layer of complexity when exploring for foods whose balance of nutrients within pairs holistically helps meet nutritional requirements. Interestingly, foods with high nutritional fitness successfully maintain this nutrient balance. This effect expands our scope to a diverse repertoire of nutrient-nutrient correlations, integrated under a common network framework that yields unexpected yet coherent associations between nutrients. Our nutrient-profiling approach combined with a network-based analysis provides a more unbiased, global view of the relationships between foods and nutrients, and can be extended towards nutritional policies, food marketing, and personalized nutrition.

nutrition
data-analysis
rather-interesting
visualization
exploratory-data-analysis
looking-to-see
pattern-discovery
august 2015 by Vaguery

[1410.5447] Recognizing molecular patterns by machine learning: an agnostic structural definition of the hydrogen bond

july 2015 by Vaguery

The concept of chemical bonding can ultimately be seen as a rationalization of the recurring structural patterns observed in molecules and solids. Chemical intuition is nothing but the ability to recognize and predict such patterns, and how they transform into one another. Here we discuss how to use a computer to identify atomic patterns automatically, so as to provide an algorithmic definition of a bond based solely on structural information. We concentrate in particular on hydrogen bonding -- a central concept to our understanding of the physical chemistry of water, biological systems and many technologically important materials. Since the hydrogen bond is a somewhat fuzzy entity that covers a broad range of energies and distances, many different criteria have been proposed and used over the years, based either on sophisticate electronic structure calculations followed by an energy decomposition analysis, or on somewhat arbitrary choices of a range of structural parameters that is deemed to correspond to a hydrogen-bonded configuration. We introduce here a definition that is univocal, unbiased, and adaptive, based on our machine-learning analysis of an atomistic simulation. The strategy we propose could be easily adapted to similar scenarios, where one has to recognize or classify structural patterns in a material or chemical compound.

chemistry
machine-learning
looking-to-see
learning-by-doing
learning-by-watching
rather-interesting
nudge-targets
consider:feature-discovery
july 2015 by Vaguery

**related tags**

Copy this bookmark: