nhaliday + atoms   121

oop - Functional programming vs Object Oriented programming - Stack Overflow
When you anticipate a different kind of software evolution:
- Object-oriented languages are good when you have a fixed set of operations on things, and as your code evolves, you primarily add new things. This can be accomplished by adding new classes which implement existing methods, and the existing classes are left alone.
- Functional languages are good when you have a fixed set of things, and as your code evolves, you primarily add new operations on existing things. This can be accomplished by adding new functions which compute with existing data types, and the existing functions are left alone.

When evolution goes the wrong way, you have problems:
- Adding a new operation to an object-oriented program may require editing many class definitions to add a new method.
- Adding a new kind of thing to a functional program may require editing many function definitions to add a new case.

This problem has been well known for many years; in 1998, Phil Wadler dubbed it the "expression problem". Although some researchers think that the expression problem can be addressed with such language features as mixins, a widely accepted solution has yet to hit the mainstream.

What are the typical problem definitions where functional programming is a better choice?

Functional languages excel at manipulating symbolic data in tree form. A favorite example is compilers, where source and intermediate languages change seldom (mostly the same things), but compiler writers are always adding new translations and code improvements or optimizations (new operations on things). Compilation and translation more generally are "killer apps" for functional languages.
q-n-a  stackex  programming  engineering  nitty-gritty  comparison  best-practices  cost-benefit  functional  data-structures  arrows  flux-stasis  atoms  compilers  examples  pls  plt  oop  types 
27 days ago by nhaliday
Workshop Abstract | Identifying and Understanding Deep Learning Phenomena
ICML 2019 workshop, June 15th 2019, Long Beach, CA

We solicit contributions that view the behavior of deep nets as natural phenomena, to be investigated with methods inspired from the natural sciences like physics, astronomy, and biology.
unit  workshop  acm  machine-learning  science  empirical  nitty-gritty  atoms  deep-learning  model-class  icml  data-science  rigor  replication  examples  ben-recht  physics 
7 weeks ago by nhaliday
Theory of Self-Reproducing Automata - John von Neumann
Fourth Lecture: THE ROLE OF HIGH AND OF EXTREMELY HIGH COMPLICATION

Comparisons between computing machines and the nervous systems. Estimates of size for computing machines, present and near future.

Estimates for size for the human central nervous system. Excursus about the “mixed” character of living organisms. Analog and digital elements. Observations about the “mixed” character of all componentry, artificial as well as natural. Interpretation of the position to be taken with respect to these.

Evaluation of the discrepancy in size between artificial and natural automata. Interpretation of this discrepancy in terms of physical factors. Nature of the materials used.

The probability of the presence of other intellectual factors. The role of complication and the theoretical penetration that it requires.

Questions of reliability and errors reconsidered. Probability of individual errors and length of procedure. Typical lengths of procedure for computing machines and for living organisms--that is, for artificial and for natural automata. Upper limits on acceptable probability of error in individual operations. Compensation by checking and self-correcting features.

Differences of principle in the way in which errors are dealt with in artificial and in natural automata. The “single error” principle in artificial automata. Crudeness of our approach in this case, due to the lack of adequate theory. More sophisticated treatment of this problem in natural automata: The role of the autonomy of parts. Connections between this autonomy and evolution.

- 10^10 neurons in brain, 10^4 vacuum tubes in largest computer at time
- machines faster: 5 ms from neuron potential to neuron potential, 10^-3 ms for vacuum tubes

https://en.wikipedia.org/wiki/John_von_Neumann#Computing
pdf  article  papers  essay  nibble  math  cs  computation  bio  neuro  neuro-nitgrit  scale  magnitude  comparison  acm  von-neumann  giants  thermo  phys-energy  speed  performance  time  density  frequency  hardware  ems  efficiency  dirty-hands  street-fighting  fermi  estimate  retention  physics  interdisciplinary  multi  wiki  links  people  🔬  atoms  automata  duplication  iteration-recursion  turing  complexity  measure  nature  technology  complex-systems  bits  information-theory  circuits  robust  structure  composition-decomposition  evolution  mutation  axioms  analogy  thinking  input-output  hi-order-bits  coding-theory  flexibility  rigidity 
april 2018 by nhaliday
Stein's example - Wikipedia
Stein's example (or phenomenon or paradox), in decision theory and estimation theory, is the phenomenon that when three or more parameters are estimated simultaneously, there exist combined estimators more accurate on average (that is, having lower expected mean squared error) than any method that handles the parameters separately. It is named after Charles Stein of Stanford University, who discovered the phenomenon in 1955.[1]

An intuitive explanation is that optimizing for the mean-squared error of a combined estimator is not the same as optimizing for the errors of separate estimators of the individual parameters. In practical terms, if the combined error is in fact of interest, then a combined estimator should be used, even if the underlying parameters are independent; this occurs in channel estimation in telecommunications, for instance (different factors affect overall channel performance). On the other hand, if one is instead interested in estimating an individual parameter, then using a combined estimator does not help and is in fact worse.

...

Many simple, practical estimators achieve better performance than the ordinary estimator. The best-known example is the James–Stein estimator, which works by starting at X and moving towards a particular point (such as the origin) by an amount inversely proportional to the distance of X from that point.
nibble  concept  levers  wiki  reference  acm  stats  probability  decision-theory  estimate  distribution  atoms 
february 2018 by nhaliday
Sequence Modeling with CTC
A visual guide to Connectionist Temporal Classification, an algorithm used to train deep neural networks in speech recognition, handwriting recognition and other sequence problems.
acmtariat  techtariat  org:bleg  nibble  better-explained  machine-learning  deep-learning  visual-understanding  visualization  analysis  let-me-see  research  sequential  audio  classification  model-class  exposition  language  acm  approximation  comparison  markov  iteration-recursion  concept  atoms  distribution  orders  DP  heuristic  optimization  trees  greedy  matching  gradient-descent 
december 2017 by nhaliday
Hyperbolic angle - Wikipedia
A unit circle {\displaystyle x^{2}+y^{2}=1} x^2 + y^2 = 1 has a circular sector with an area half of the circular angle in radians. Analogously, a unit hyperbola {\displaystyle x^{2}-y^{2}=1} {\displaystyle x^{2}-y^{2}=1} has a hyperbolic sector with an area half of the hyperbolic angle.
nibble  math  trivia  wiki  reference  physics  relativity  concept  atoms  geometry  ground-up  characterization  measure  definition  plots  calculation  nitty-gritty  direction  metrics  manifolds 
november 2017 by nhaliday
Reynolds number - Wikipedia
The Reynolds number is the ratio of inertial forces to viscous forces within a fluid which is subjected to relative internal movement due to different fluid velocities, in what is known as a boundary layer in the case of a bounding surface such as the interior of a pipe. A similar effect is created by the introduction of a stream of higher velocity fluid, such as the hot gases from a flame in air. This relative movement generates fluid friction, which is a factor in developing turbulent flow. Counteracting this effect is the viscosity of the fluid, which as it increases, progressively inhibits turbulence, as more kinetic energy is absorbed by a more viscous fluid. The Reynolds number quantifies the relative importance of these two types of forces for given flow conditions, and is a guide to when turbulent flow will occur in a particular situation.[6]

Re = ρuL/μ

(inertial forces)/(viscous forces)
= (mass)(acceleration) / (dynamic viscosity)(velocity/distance)(area)
= (ρL^3)(v/t) / μ(v/L)L^2
= Re

NB: viscous force/area ~ μ du/dy is definition of viscosity
nibble  concept  metrics  definition  physics  mechanics  fluid  street-fighting  wiki  reference  atoms  history  early-modern  europe  the-great-west-whale  britain  science  the-trenches  experiment 
september 2017 by nhaliday
Power of a point - Wikipedia
The power of point P (see in Figure 1) can be defined equivalently as the product of distances from the point P to the two intersection points of any ray emanating from P.
nibble  math  geometry  spatial  ground-up  concept  metrics  invariance  identity  atoms  wiki  reference  measure  yoga  calculation 
september 2017 by nhaliday
Centers of gravity in non-uniform fields - Wikipedia
In physics, a center of gravity of a material body is a point that may be used for a summary description of gravitational interactions. In a uniform gravitational field, the center of mass serves as the center of gravity. This is a very good approximation for smaller bodies near the surface of Earth, so there is no practical need to distinguish "center of gravity" from "center of mass" in most applications, such as engineering and medicine.

In a non-uniform field, gravitational effects such as potential energy, force, and torque can no longer be calculated using the center of mass alone. In particular, a non-uniform gravitational field can produce a torque on an object, even about an axis through the center of mass. The center of gravity seeks to explain this effect. Formally, a center of gravity is an application point of the resultant gravitational force on the body. Such a point may not exist, and if it exists, it is not unique. One can further define a unique center of gravity by approximating the field as either parallel or spherically symmetric.

The concept of a center of gravity as distinct from the center of mass is rarely used in applications, even in celestial mechanics, where non-uniform fields are important. Since the center of gravity depends on the external field, its motion is harder to determine than the motion of the center of mass. The common method to deal with gravitational torques is a field theory.
nibble  wiki  reference  physics  mechanics  intricacy  atoms  expectancy  spatial  direction  ground-up  concept  existence  uniqueness  homo-hetero  gravity  gotchas 
september 2017 by nhaliday
Drude model - Wikipedia
The Drude model of electrical conduction was proposed in 1900[1][2] by Paul Drude to explain the transport properties of electrons in materials (especially metals). The model, which is an application of kinetic theory, assumes that the microscopic behavior of electrons in a solid may be treated classically and looks much like _a pinball machine_, with a sea of constantly jittering electrons bouncing and re-bouncing off heavier, relatively immobile positive ions.

The two most significant results of the Drude model are an electronic equation of motion,

d<p(t)>/dt = q(E + 1/m <p(t)> x B) - <p(t)>/τ

and a linear relationship between current density J and electric field E,

J = (nq^2τ/m) E

latter is Ohm's law
nibble  physics  electromag  models  local-global  stat-mech  identity  atoms  wiki  reference  ground-up  cartoons 
september 2017 by nhaliday
Flows With Friction
To see how the no-slip condition arises, and how the no-slip condition and the fluid viscosity lead to frictional stresses, we can examine the conditions at a solid surface on a molecular scale. When a fluid is stationary, its molecules are in a constant state of motion with a random velocity v. For a gas, v is equal to the speed of sound. When a fluid is in motion, there is superimposed on this random velocity a mean velocity V, sometimes called the bulk velocity, which is the velocity at which fluid from one place to another. At the interface between the fluid and the surface, there exists an attraction between the molecules or atoms that make up the fluid and those that make up the solid. This attractive force is strong enough to reduce the bulk velocity of the fluid to zero. So the bulk velocity of the fluid must change from whatever its value is far away from the wall to a value of zero at the wall (figure 7). This is called the no-slip condition.

http://www.engineeringarchives.com/les_fm_noslip.html
The fluid property responsible for the no-slip condition and the development of the boundary layer is viscosity.
https://www.quora.com/What-is-the-physics-behind-no-slip-condition-in-fluid-mechanics
https://www.reddit.com/r/AskEngineers/comments/348b1q/the_noslip_condition/
https://www.researchgate.net/post/Can_someone_explain_what_exactly_no_slip_condition_or_slip_condition_means_in_terms_of_momentum_transfer_of_the_molecules
https://en.wikipedia.org/wiki/Boundary_layer_thickness
http://www.fkm.utm.my/~ummi/SME1313/Chapter%201.pdf
org:junk  org:edu  physics  mechanics  h2o  identity  atoms  constraint-satisfaction  volo-avolo  flux-stasis  chemistry  stat-mech  nibble  multi  q-n-a  reddit  social  discussion  dirty-hands  pdf  slides  lectures  qra  fluid  local-global  explanation 
september 2017 by nhaliday
Kelly criterion - Wikipedia
In probability theory and intertemporal portfolio choice, the Kelly criterion, Kelly strategy, Kelly formula, or Kelly bet, is a formula used to determine the optimal size of a series of bets. In most gambling scenarios, and some investing scenarios under some simplifying assumptions, the Kelly strategy will do better than any essentially different strategy in the long run (that is, over a span of time in which the observed fraction of bets that are successful equals the probability that any given bet will be successful). It was described by J. L. Kelly, Jr, a researcher at Bell Labs, in 1956.[1] The practical use of the formula has been demonstrated.[2][3][4]

The Kelly Criterion is to bet a predetermined fraction of assets and can be counterintuitive. In one study,[5][6] each participant was given $25 and asked to bet on a coin that would land heads 60% of the time. Participants had 30 minutes to play, so could place about 300 bets, and the prizes were capped at $250. Behavior was far from optimal. "Remarkably, 28% of the participants went bust, and the average payout was just $91. Only 21% of the participants reached the maximum. 18 of the 61 participants bet everything on one toss, while two-thirds gambled on tails at some stage in the experiment." Using the Kelly criterion and based on the odds in the experiment, the right approach would be to bet 20% of the pot on each throw (see first example in Statement below). If losing, the size of the bet gets cut; if winning, the stake increases.
nibble  betting  investing  ORFE  acm  checklists  levers  probability  algorithms  wiki  reference  atoms  extrema  parsimony  tidbits  decision-theory  decision-making  street-fighting  mental-math  calculation 
august 2017 by nhaliday
Subgradients - S. Boyd and L. Vandenberghe
If f is convex and x ∈ int dom f, then ∂f(x) is nonempty and bounded. To establish that ∂f(x) ≠ ∅, we apply the supporting hyperplane theorem to the convex set epi f at the boundary point (x, f(x)), ...
pdf  nibble  lecture-notes  acm  optimization  curvature  math.CA  estimate  linearity  differential  existence  proofs  exposition  atoms  math  marginal  convexity-curvature 
august 2017 by nhaliday
Superintelligence Risk Project Update II
https://www.jefftk.com/p/superintelligence-risk-project-update

https://www.jefftk.com/p/conversation-with-michael-littman
For example, I asked him what he thought of the idea that to we could get AGI with current techniques, primarily deep neural nets and reinforcement learning, without learning anything new about how intelligence works or how to implement it ("Prosaic AGI" [1]). He didn't think this was possible, and believes there are deep conceptual issues we still need to get a handle on. He's also less impressed with deep learning than he was before he started working in it: in his experience it's a much more brittle technology than he had been expecting. Specifically, when trying to replicate results, he's often found that they depend on a bunch of parameters being in just the right range, and without that the systems don't perform nearly as well.

The bottom line, to him, was that since we are still many breakthroughs away from getting to AGI, we can't productively work on reducing superintelligence risk now.

He told me that he worries that the AI risk community is not solving real problems: they're making deductions and inferences that are self-consistent but not being tested or verified in the world. Since we can't tell if that's progress, it probably isn't. I asked if he was referring to MIRI's work here, and he said their work was an example of the kind of approach he's skeptical about, though he wasn't trying to single them out. [2]

https://www.jefftk.com/p/conversation-with-an-ai-researcher
Earlier this week I had a conversation with an AI researcher [1] at one of the main industry labs as part of my project of assessing superintelligence risk. Here's what I got from them:

They see progress in ML as almost entirely constrained by hardware and data, to the point that if today's hardware and data had existed in the mid 1950s researchers would have gotten to approximately our current state within ten to twenty years. They gave the example of backprop: we saw how to train multi-layer neural nets decades before we had the computing power to actually train these nets to do useful things.

Similarly, people talk about AlphaGo as a big jump, where Go went from being "ten years away" to "done" within a couple years, but they said it wasn't like that. If Go work had stayed in academia, with academia-level budgets and resources, it probably would have taken nearly that long. What changed was a company seeing promising results, realizing what could be done, and putting way more engineers and hardware on the project than anyone had previously done. AlphaGo couldn't have happened earlier because the hardware wasn't there yet, and was only able to be brought forward by massive application of resources.

https://www.jefftk.com/p/superintelligence-risk-project-conclusion
Summary: I'm not convinced that AI risk should be highly prioritized, but I'm also not convinced that it shouldn't. Highly qualified researchers in a position to have a good sense the field have massively different views on core questions like how capable ML systems are now, how capable they will be soon, and how we can influence their development. I do think these questions are possible to get a better handle on, but I think this would require much deeper ML knowledge than I have.
ratty  core-rats  ai  risk  ai-control  prediction  expert  machine-learning  deep-learning  speedometer  links  research  research-program  frontier  multi  interview  deepgoog  games  hardware  performance  roots  impetus  chart  big-picture  state-of-art  reinforcement  futurism  🤖  🖥  expert-experience  singularity  miri-cfar  empirical  evidence-based  speculation  volo-avolo  clever-rats  acmtariat  robust  ideas  crux  atoms  detail-architecture  software  gradient-descent 
july 2017 by nhaliday
Total factor productivity - Wikipedia
The equation below (in Cobb–Douglas form) represents total output (Y) as a function of total-factor productivity (A), capital input (K), labor input (L), and the two inputs' respective shares of output (α and β are the share of contribution for K and L respectively). An increase in either A, K or L will lead to an increase in output.

Y = A x K^α x L^β

Technology growth and efficiency are regarded as two of the biggest sub-sections of Total Factor Productivity, the former possessing "special" inherent features such as positive externalities and non-rivalness which enhance its position as a driver of economic growth.

Total Factor Productivity is often seen as the real driver of growth within an economy and studies reveal that whilst labour and investment are important contributors, Total Factor Productivity may account for up to 60% of growth within economies.[2]

It has been shown that there is a historical correlation between TFP and energy conversion efficiency.[3] Also, it has been found that integration (among firms for example) has a casual positive impact on total factor productivity. [4]
concept  economics  growth-econ  econ-productivity  econ-metrics  wiki  reference  energy-resources  biophysical-econ  the-world-is-just-atoms  efficiency  labor  capital  atoms  human-capital  innovation  technology  🎩  distribution  variance-components  input-output 
june 2017 by nhaliday
Merkle tree - Wikipedia
In cryptography and computer science, a hash tree or Merkle tree is a tree in which every non-leaf node is labelled with the hash of the labels or values (in case of leaves) of its child nodes.
concept  cs  data-structures  bitcoin  cryptocurrency  blockchain  atoms  protocol  wiki  reference  nibble  hashing  ideas  crypto  rigorous-crypto 
june 2017 by nhaliday
Strings, periods, and borders
A border of x is any proper prefix of x that equals a suffix of x.

...overlapping borders of a string imply that the string is periodic...

In the border array ß[1..n] of x, entry ß[i] is the length
of the longest border of x[1..i].
pdf  nibble  slides  lectures  algorithms  strings  exposition  yoga  atoms  levers  tidbits  sequential 
may 2017 by nhaliday
Kaldor–Hicks efficiency - Wikipedia
A Kaldor–Hicks improvement, named for Nicholas Kaldor and John Hicks, also known as the Kaldor–Hicks criterion, is a way of judging economic re-allocations of resources among people that captures some of the intuitive appeal of Pareto efficiencies, but has less stringent criteria and is hence applicable to more circumstances. A re-allocation is a Kaldor–Hicks improvement if those that are made better off could hypothetically compensate those that are made worse off and lead to a Pareto-improving outcome. The compensation does not actually have to occur (there is no presumption in favor of status-quo) and thus, a Kaldor–Hicks improvement can in fact leave some people worse off.

A situation is said to be Kaldor–Hicks efficient if no potential Kaldor–Hicks improvement from that situation exists.
concept  atoms  economics  micro  GT-101  pareto  redistribution  policy  government  wiki  reference  jargon  methodology  efficiency  welfare-state  equilibrium 
april 2017 by nhaliday
Mean field theory - Wikipedia
In physics and probability theory, mean field theory (MFT also known as self-consistent field theory) studies the behavior of large and complex stochastic models by studying a simpler model. Such models consider a large number of small individual components which interact with each other. The effect of all the other individuals on any given individual is approximated by a single averaged effect, thus reducing a many-body problem to a one-body problem.
concept  atoms  models  physics  stat-mech  ising  approximation  parsimony  wiki  reference  nibble 
march 2017 by nhaliday
Beta function - Wikipedia
B(x, y) = int_0^1 t^{x-1}(1-t)^{y-1} dt = Γ(x)Γ(y)/Γ(x+y)
one misc. application: calculating pdf of Erlang distribution (sum of iid exponential r.v.s)
concept  atoms  acm  math  calculation  integral  wiki  reference  identity  AMT  distribution  multiplicative 
march 2017 by nhaliday
general topology - What should be the intuition when working with compactness? - Mathematics Stack Exchange
http://math.stackexchange.com/questions/485822/why-is-compactness-so-important

The situation with compactness is sort of like the above. It turns out that finiteness, which you think of as one concept (in the same way that you think of "Foo" as one concept above), is really two concepts: discreteness and compactness. You've never seen these concepts separated before, though. When people say that compactness is like finiteness, they mean that compactness captures part of what it means to be finite in the same way that shortness captures part of what it means to be Foo.

--

As many have said, compactness is sort of a topological generalization of finiteness. And this is true in a deep sense, because topology deals with open sets, and this means that we often "care about how something behaves on an open set", and for compact spaces this means that there are only finitely many possible behaviors.

--

Compactness does for continuous functions what finiteness does for functions in general.

If a set A is finite then every function f:A→R has a max and a min, and every function f:A→R^n is bounded. If A is compact, the every continuous function from A to R has a max and a min and every continuous function from A to R^n is bounded.

If A is finite then every sequence of members of A has a subsequence that is eventually constant, and "eventually constant" is the only kind of convergence you can talk about without talking about a topology on the set. If A is compact, then every sequence of members of A has a convergent subsequence.
q-n-a  overflow  math  topology  math.GN  concept  finiteness  atoms  intuition  oly  mathtariat  multi  discrete  gowers  motivation  synthesis  hi-order-bits  soft-question  limits  things  nibble  definition  convergence  abstraction  span-cover 
january 2017 by nhaliday
Ehrhart polynomial - Wikipedia
In mathematics, an integral polytope has an associated Ehrhart polynomial that encodes the relationship between the volume of a polytope and the number of integer points the polytope contains. The theory of Ehrhart polynomials can be seen as a higher-dimensional generalization of Pick's theorem in the Euclidean plane.
math  math.MG  trivia  polynomials  discrete  wiki  reference  atoms  geometry  spatial  nibble  curvature  convexity-curvature 
january 2017 by nhaliday
Galton–Watson process - Wikipedia
The Galton–Watson process is a branching stochastic process arising from Francis Galton's statistical investigation of the extinction of family names. The process models family names as patrilineal (passed from father to son), while offspring are randomly either male or female, and names become extinct if the family name line dies out (holders of the family name die without male descendants). This is an accurate description of Y chromosome transmission in genetics, and the model is thus useful for understanding human Y-chromosome DNA haplogroups, and is also of use in understanding other processes (as described below); but its application to actual extinction of family names is fraught. In practice, family names change for many other reasons, and dying out of name line is only one factor, as discussed in examples, below; the Galton–Watson process is thus of limited applicability in understanding actual family name distributions.
galton  history  stories  stats  stochastic-processes  acm  concept  wiki  reference  atoms  giants  early-modern  nibble  old-anglo  pre-ww2 
january 2017 by nhaliday
Cantor function - Wikipedia
- uniformly continuous but not absolutely continuous
- derivative zero almost everywhere but not constant
- see also: http://mathoverflow.net/questions/31603/why-do-probabilists-take-random-variables-to-be-borel-and-not-lebesgue-measura/31609#31609 (the exercise mentioned uses c(x)+x for c the Cantor function)
math  math.CA  counterexample  wiki  reference  multi  math.FA  atoms  measure  smoothness  singularity  nibble 
january 2017 by nhaliday
pr.probability - What is convolution intuitively? - MathOverflow
I remember as a graduate student that Ingrid Daubechies frequently referred to convolution by a bump function as "blurring" - its effect on images is similar to what a short-sighted person experiences when taking off his or her glasses (and, indeed, if one works through the geometric optics, convolution is not a bad first approximation for this effect). I found this to be very helpful, not just for understanding convolution per se, but as a lesson that one should try to use physical intuition to model mathematical concepts whenever one can.

More generally, if one thinks of functions as fuzzy versions of points, then convolution is the fuzzy version of addition (or sometimes multiplication, depending on the context). The probabilistic interpretation is one example of this (where the fuzz is a a probability distribution), but one can also have signed, complex-valued, or vector-valued fuzz, of course.
q-n-a  overflow  math  concept  atoms  intuition  motivation  gowers  visual-understanding  aphorism  soft-question  tidbits  👳  mathtariat  cartoons  ground-up  metabuch  analogy  nibble  yoga  neurons  retrofit  optics  concrete  s:*  multiplicative  fourier 
january 2017 by nhaliday
Quarter-Turns | The n-Category Café
In other words, call an operator T a quarter-turn if ⟨Tx,x⟩=0 for all x. Then the real quarter-turns correspond to the skew symmetric matrices — but apart from the zero operator, there are no complex quarter turns at all.
tidbits  math  linear-algebra  hmm  mathtariat  characterization  atoms  inner-product  arrows  org:bleg  nibble 
december 2016 by nhaliday
predictive models - Is this the state of art regression methodology? - Cross Validated
I've been following Kaggle competitions for a long time and I come to realize that many winning strategies involve using at least one of the "big threes": bagging, boosting and stacking.

For regressions, rather than focusing on building one best possible regression model, building multiple regression models such as (Generalized) linear regression, random forest, KNN, NN, and SVM regression models and blending the results into one in a reasonable way seems to out-perform each individual method a lot of times.
q-n-a  state-of-art  machine-learning  acm  data-science  atoms  overflow  soft-question  regression  ensembles  nibble  oly 
november 2016 by nhaliday
« earlier      
per page:    204080120160

bundles : abstractmathmetaproblem-solvingthinkingtk

related tags

aaronson  absolute-relative  abstraction  academia  acm  acmtariat  adversarial  ai  ai-control  alg-combo  algebra  algebraic-complexity  algorithms  alignment  AMT  analogy  analysis  aphorism  apollonian-dionysian  applications  approximation  arrows  article  atoms  attention  audio  auto-learning  automata  average-case  axioms  backup  bare-hands  bayesian  ben-recht  benchmarks  best-practices  better-explained  betting  bias-variance  big-picture  binomial  bio  bioinformatics  biophysical-econ  bitcoin  bits  blockchain  boltzmann  bostrom  brexit  britain  calculation  calculator  caltech  capital  carmack  cartoons  characterization  chart  cheatsheet  checklists  chemistry  circuits  classic  classification  clever-rats  closure  cmu  coarse-fine  coding-theory  commentary  comparison  compensation  competition  compilers  complement-substitute  complex-systems  complexity  composition-decomposition  computation  computer-vision  concept  conceptual-vocab  concrete  confluence  confusion  constraint-satisfaction  contrarianism  convergence  convexity-curvature  cool  cooperate-defect  coordination  core-rats  cornell  cost-benefit  counterexample  course  critique  crux  crypto  cryptocurrency  cs  curiosity  curvature  dan-luu  data  data-science  data-structures  debate  debugging  decentralized  decision-making  decision-theory  deep-learning  deepgoog  definition  degrees-of-freedom  density  descriptive  detail-architecture  developmental  differential  dimensionality  direct-indirect  direction  dirty-hands  discovery  discrete  discussion  distributed  distribution  DP  duality  dumb-ML  duplication  dynamic  dynamical  early-modern  econ-metrics  econ-productivity  econometrics  economics  efficiency  elections  electromag  empirical  ems  ends-means  energy-resources  engineering  ensembles  entropy-like  equilibrium  erdos  ergodic  eric-kaufmann  error  essay  estimate  europe  evidence-based  evolution  examples  existence  exocortex  expectancy  experiment  expert  expert-experience  explanans  explanation  exploratory  explore-exploit  exposition  extrema  features  fermi  finiteness  flexibility  fluid  flux-stasis  fourier  frequency  frontier  functional  futurism  galton  game-theory  games  generative  geometry  giants  google  gotchas  government  gowers  gradient-descent  graph-theory  graphical-models  graphs  gravity  greedy  ground-up  growth-econ  GT-101  guide  GWAS  gwern  h2o  hardware  hashing  heuristic  hi-order-bits  high-dimension  history  hmm  hn  homepage  homo-hetero  homogeneity  human-capital  humanity  icml  ideas  identity  idk  impact  impetus  incentives  infographic  information-theory  init  inner-product  innovation  input-output  insight  integral  intelligence  interdisciplinary  interests  interpretability  interview  intricacy  intuition  invariance  investing  ising  isotropy  iteration-recursion  iterative-methods  jargon  kaggle  kernels  knowledge  labor  land  language  latent-variables  learning-theory  lecture-notes  lectures  len:short  lesswrong  let-me-see  levers  limits  linear-algebra  linear-models  linear-programming  linearity  liner-notes  links  list  local-global  lower-bounds  machine-learning  magnitude  manifolds  marginal  markets  markov  martingale  matching  math  math.CA  math.CO  math.DS  math.FA  math.GN  math.GR  math.MG  math.RT  mathtariat  matrix-factorization  measure  mechanics  mental-math  meta:math  metabuch  metameta  methodology  metric-space  metrics  micro  minimum-viable  miri-cfar  mit  ML-MAP-E  model-class  models  moloch  moments  monotonicity  monte-carlo  motivation  multi  multiplicative  mutation  nature  network-structure  neuro  neuro-nitgrit  neurons  nibble  nitty-gritty  nlp  nonlinearity  nonparametric  norms  novelty  number  numerics  objektbuch  ocw  off-convex  old-anglo  oly  oop  open-problems  openai  operational  optics  optimization  orders  ORFE  org:bleg  org:edu  org:junk  org:lite  org:mat  org:med  orourke  oscillation  overflow  p2p  p:***  p:someday  papers  pareto  parsimony  pdf  people  performance  philosophy  phys-energy  physics  pigeonhole-markov  plots  pls  plt  policy  polisci  polynomials  positivity  power-law  pragmatic  pre-2013  pre-ww2  prediction  preimage  preprint  princeton  prioritizing  priors-posteriors  probability  problem-solving  programming  project  proof-systems  proofs  properties  protocol  q-n-a  qra  quantum  questions  quixotic  rand-approx  random  ratty  reading  realness  rec-math  reddit  redistribution  reduction  reference  reflection  regression  regularity  regularization  regulation  reinforcement  relativity  relaxation  replication  repo  research  research-program  retention  retrofit  rhetoric  rigidity  rigor  rigorous-crypto  risk  roadmap  robust  roots  rounding  s:*  s:***  saas  sampling  scale  scholar-pack  science  SDP  search  sequential  series  shipping  SIGGRAPH  signal-noise  signum  similarity  simplex  singularity  skeleton  slides  smoothness  social  social-science  sociology  soft-question  software  space  space-complexity  span-cover  sparsity  spatial  spectral  speculation  speed  speedometer  stackex  stanford  stat-mech  state  state-of-art  stats  stirling  stochastic-processes  stories  strategy  street-fighting  strings  structure  study  studying  stylized-facts  summary  supply-demand  survey  synthesis  systems  szabo  talks  tcs  tcstariat  technology  techtariat  telos-atelos  tensors  the-great-west-whale  the-self  the-trenches  the-world-is-just-atoms  thermo  things  thinking  threat-modeling  tidbits  time  time-complexity  todo  toolkit  tools  top-n  topics  topology  track-record  trees  tricki  trivia  turing  tutorial  twitter  types  uniqueness  unit  unsupervised  vague  values  variance-components  video  visual-understanding  visualization  volo-avolo  von-neumann  welfare-state  wiki  wire-guided  workshop  worse-is-better/the-right-thing  yoga  🎓  🎩  👳  🔬  🖥  🤖 

Copy this bookmark:



description:


tags: