nhaliday + acm + atoms   45

Workshop Abstract | Identifying and Understanding Deep Learning Phenomena
ICML 2019 workshop, June 15th 2019, Long Beach, CA

We solicit contributions that view the behavior of deep nets as natural phenomena, to be investigated with methods inspired from the natural sciences like physics, astronomy, and biology.
unit  workshop  acm  machine-learning  science  empirical  nitty-gritty  atoms  deep-learning  model-class  icml  data-science  rigor  replication  examples  ben-recht  physics 
april 2019 by nhaliday
Theory of Self-Reproducing Automata - John von Neumann

Comparisons between computing machines and the nervous systems. Estimates of size for computing machines, present and near future.

Estimates for size for the human central nervous system. Excursus about the “mixed” character of living organisms. Analog and digital elements. Observations about the “mixed” character of all componentry, artificial as well as natural. Interpretation of the position to be taken with respect to these.

Evaluation of the discrepancy in size between artificial and natural automata. Interpretation of this discrepancy in terms of physical factors. Nature of the materials used.

The probability of the presence of other intellectual factors. The role of complication and the theoretical penetration that it requires.

Questions of reliability and errors reconsidered. Probability of individual errors and length of procedure. Typical lengths of procedure for computing machines and for living organisms--that is, for artificial and for natural automata. Upper limits on acceptable probability of error in individual operations. Compensation by checking and self-correcting features.

Differences of principle in the way in which errors are dealt with in artificial and in natural automata. The “single error” principle in artificial automata. Crudeness of our approach in this case, due to the lack of adequate theory. More sophisticated treatment of this problem in natural automata: The role of the autonomy of parts. Connections between this autonomy and evolution.

- 10^10 neurons in brain, 10^4 vacuum tubes in largest computer at time
- machines faster: 5 ms from neuron potential to neuron potential, 10^-3 ms for vacuum tubes

pdf  article  papers  essay  nibble  math  cs  computation  bio  neuro  neuro-nitgrit  scale  magnitude  comparison  acm  von-neumann  giants  thermo  phys-energy  speed  performance  time  density  frequency  hardware  ems  efficiency  dirty-hands  street-fighting  fermi  estimate  retention  physics  interdisciplinary  multi  wiki  links  people  🔬  atoms  duplication  iteration-recursion  turing  complexity  measure  nature  technology  complex-systems  bits  information-theory  circuits  robust  structure  composition-decomposition  evolution  mutation  axioms  analogy  thinking  input-output  hi-order-bits  coding-theory  flexibility  rigidity  automata-languages 
april 2018 by nhaliday
Stein's example - Wikipedia
Stein's example (or phenomenon or paradox), in decision theory and estimation theory, is the phenomenon that when three or more parameters are estimated simultaneously, there exist combined estimators more accurate on average (that is, having lower expected mean squared error) than any method that handles the parameters separately. It is named after Charles Stein of Stanford University, who discovered the phenomenon in 1955.[1]

An intuitive explanation is that optimizing for the mean-squared error of a combined estimator is not the same as optimizing for the errors of separate estimators of the individual parameters. In practical terms, if the combined error is in fact of interest, then a combined estimator should be used, even if the underlying parameters are independent; this occurs in channel estimation in telecommunications, for instance (different factors affect overall channel performance). On the other hand, if one is instead interested in estimating an individual parameter, then using a combined estimator does not help and is in fact worse.


Many simple, practical estimators achieve better performance than the ordinary estimator. The best-known example is the James–Stein estimator, which works by starting at X and moving towards a particular point (such as the origin) by an amount inversely proportional to the distance of X from that point.
nibble  concept  levers  wiki  reference  acm  stats  probability  decision-theory  estimate  distribution  atoms 
february 2018 by nhaliday
Sequence Modeling with CTC
A visual guide to Connectionist Temporal Classification, an algorithm used to train deep neural networks in speech recognition, handwriting recognition and other sequence problems.
acmtariat  techtariat  org:bleg  nibble  better-explained  machine-learning  deep-learning  visual-understanding  visualization  analysis  let-me-see  research  sequential  audio  classification  model-class  exposition  language  acm  approximation  comparison  markov  iteration-recursion  concept  atoms  distribution  orders  DP  heuristic  optimization  trees  greedy  matching  gradient-descent 
december 2017 by nhaliday
Kelly criterion - Wikipedia
In probability theory and intertemporal portfolio choice, the Kelly criterion, Kelly strategy, Kelly formula, or Kelly bet, is a formula used to determine the optimal size of a series of bets. In most gambling scenarios, and some investing scenarios under some simplifying assumptions, the Kelly strategy will do better than any essentially different strategy in the long run (that is, over a span of time in which the observed fraction of bets that are successful equals the probability that any given bet will be successful). It was described by J. L. Kelly, Jr, a researcher at Bell Labs, in 1956.[1] The practical use of the formula has been demonstrated.[2][3][4]

The Kelly Criterion is to bet a predetermined fraction of assets and can be counterintuitive. In one study,[5][6] each participant was given $25 and asked to bet on a coin that would land heads 60% of the time. Participants had 30 minutes to play, so could place about 300 bets, and the prizes were capped at $250. Behavior was far from optimal. "Remarkably, 28% of the participants went bust, and the average payout was just $91. Only 21% of the participants reached the maximum. 18 of the 61 participants bet everything on one toss, while two-thirds gambled on tails at some stage in the experiment." Using the Kelly criterion and based on the odds in the experiment, the right approach would be to bet 20% of the pot on each throw (see first example in Statement below). If losing, the size of the bet gets cut; if winning, the stake increases.
nibble  betting  investing  ORFE  acm  checklists  levers  probability  algorithms  wiki  reference  atoms  extrema  parsimony  tidbits  decision-theory  decision-making  street-fighting  mental-math  calculation 
august 2017 by nhaliday
Subgradients - S. Boyd and L. Vandenberghe
If f is convex and x ∈ int dom f, then ∂f(x) is nonempty and bounded. To establish that ∂f(x) ≠ ∅, we apply the supporting hyperplane theorem to the convex set epi f at the boundary point (x, f(x)), ...
pdf  nibble  lecture-notes  acm  optimization  curvature  math.CA  estimate  linearity  differential  existence  proofs  exposition  atoms  math  marginal  convexity-curvature 
august 2017 by nhaliday
Beta function - Wikipedia
B(x, y) = int_0^1 t^{x-1}(1-t)^{y-1} dt = Γ(x)Γ(y)/Γ(x+y)
one misc. application: calculating pdf of Erlang distribution (sum of iid exponential r.v.s)
concept  atoms  acm  math  calculation  integral  wiki  reference  identity  AMT  distribution  multiplicative 
march 2017 by nhaliday
Galton–Watson process - Wikipedia
The Galton–Watson process is a branching stochastic process arising from Francis Galton's statistical investigation of the extinction of family names. The process models family names as patrilineal (passed from father to son), while offspring are randomly either male or female, and names become extinct if the family name line dies out (holders of the family name die without male descendants). This is an accurate description of Y chromosome transmission in genetics, and the model is thus useful for understanding human Y-chromosome DNA haplogroups, and is also of use in understanding other processes (as described below); but its application to actual extinction of family names is fraught. In practice, family names change for many other reasons, and dying out of name line is only one factor, as discussed in examples, below; the Galton–Watson process is thus of limited applicability in understanding actual family name distributions.
galton  history  stories  stats  stochastic-processes  acm  concept  wiki  reference  atoms  giants  early-modern  nibble  old-anglo  pre-ww2 
january 2017 by nhaliday
predictive models - Is this the state of art regression methodology? - Cross Validated
I've been following Kaggle competitions for a long time and I come to realize that many winning strategies involve using at least one of the "big threes": bagging, boosting and stacking.

For regressions, rather than focusing on building one best possible regression model, building multiple regression models such as (Generalized) linear regression, random forest, KNN, NN, and SVM regression models and blending the results into one in a reasonable way seems to out-perform each individual method a lot of times.
q-n-a  state-of-art  machine-learning  acm  data-science  atoms  overflow  soft-question  regression  ensembles  nibble  oly 
november 2016 by nhaliday
Kullback–Leibler divergence - Wikipedia, the free encyclopedia
see https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence#Motivation especially

Kullback-Leibler divergence has an enormous number of interpretations and uses: psychological, epistemic, thermodynamic, statistical, computational, geometrical... I am pretty sure I could teach an entire graduate seminar on it.
information-theory  math  wiki  probability  concept  reference  acm  hmm  atoms  operational  characterization  metrics  bits  entropy-like  nibble  properties  multi  twitter  social  discussion  backup 
may 2016 by nhaliday

bundles : abstractacademeacmframemathmetaproblem-solvingthinkingtk

related tags

academia  acm  acmtariat  advanced  adversarial  algorithms  AMT  analogy  analysis  apollonian-dionysian  applications  approximation  arrows  article  atoms  audio  automata-languages  axioms  backup  bare-hands  bayesian  ben-recht  benchmarks  better-explained  betting  bias-variance  big-picture  bio  bioinformatics  bits  boltzmann  brexit  britain  calculation  calculator  caltech  characterization  chart  checklists  circuits  classification  clever-rats  cmu  coarse-fine  coding-theory  commentary  comparison  complex-systems  complexity  composition-decomposition  computation  concept  conceptual-vocab  confluence  confusion  constraint-satisfaction  convexity-curvature  cool  counterexample  course  cs  curvature  data-science  decision-making  decision-theory  deep-learning  definition  density  descriptive  differential  dimensionality  dirty-hands  discrete  discussion  distribution  DP  duality  dumb-ML  duplication  dynamic  dynamical  early-modern  efficiency  elections  empirical  ems  ends-means  ensembles  entropy-like  eric-kaufmann  essay  estimate  evolution  examples  existence  exocortex  expert  expert-experience  explanation  exploratory  exposition  extrema  features  fermi  finiteness  flexibility  fourier  frequency  galton  game-theory  generative  geometry  giants  gowers  gradient-descent  graph-theory  graphical-models  graphs  greedy  ground-up  guide  GWAS  hardware  hashing  heuristic  hi-order-bits  history  hmm  hn  homepage  homogeneity  icml  ideas  identity  idk  IEEE  impact  information-theory  init  inner-product  input-output  integral  interdisciplinary  intuition  invariance  investing  iteration-recursion  iterative-methods  kernels  knowledge  language  latent-variables  learning-theory  lecture-notes  lectures  let-me-see  levers  linear-algebra  linear-models  linear-programming  linearity  links  list  machine-learning  magnitude  manifolds  marginal  markov  martingale  matching  math  math.CA  math.DS  math.FA  math.GN  math.MG  mathtariat  matrix-factorization  measure  mental-math  meta:math  metabuch  metameta  methodology  metric-space  metrics  ML-MAP-E  model-class  models  moments  monte-carlo  motivation  multi  multiplicative  mutation  nature  network-structure  neuro  neuro-nitgrit  nibble  nitty-gritty  nlp  nonlinearity  nonparametric  norms  old-anglo  oly  open-problems  openai  operational  optimization  orders  ORFE  org:bleg  org:edu  org:mat  orourke  oscillation  overflow  p:***  p:someday  papers  parsimony  pdf  people  performance  phys-energy  physics  pigeonhole-markov  plots  polisci  polynomials  positivity  power-law  pragmatic  pre-2013  pre-ww2  preprint  princeton  prioritizing  priors-posteriors  probability  problem-solving  proofs  properties  q-n-a  qra  questions  quixotic  rand-approx  random  ratty  realness  reduction  reference  regression  reinforcement  relaxation  replication  research  retention  rigidity  rigor  roadmap  robust  rounding  s:***  sampling  scale  scholar-pack  science  search  sequential  series  signal-noise  signum  similarity  skeleton  slides  smoothness  social  social-science  sociology  soft-question  sparsity  spatial  spectral  speed  stanford  state-of-art  stats  stochastic-processes  stories  street-fighting  structure  study  studying  synthesis  talks  tcs  technology  techtariat  telos-atelos  thermo  thinking  tidbits  time  toolkit  tools  top-n  topology  track-record  trees  tricki  turing  tutorial  twitter  unit  unsupervised  values  visual-understanding  visualization  volo-avolo  von-neumann  wiki  workshop  yoga  🎓  👳  🔬 

Copy this bookmark: