nhaliday + machine-learning   444

Workshop Abstract | Identifying and Understanding Deep Learning Phenomena
ICML 2019 workshop, June 15th 2019, Long Beach, CA

We solicit contributions that view the behavior of deep nets as natural phenomena, to be investigated with methods inspired from the natural sciences like physics, astronomy, and biology.
unit  workshop  acm  machine-learning  science  empirical  nitty-gritty  atoms  deep-learning  model-class  icml  data-science  rigor  replication  examples  ben-recht  physics 
7 hours ago by nhaliday
Applications of computational learning theory in the cognitive sciences - Psychology & Neuroscience Stack Exchange
1. Gold's theorem on the unlearnability in the limit of certain sets of languages, among them context-free ones.

2. Ronald de Wolf's master's thesis on the impossibility to PAC-learn context-free languages.

The first made quiet a stir in the poverty-of-the-stimulus debate, and the second has been unnoticed by cognitive science.
q-n-a  stackex  psychology  cog-psych  learning  learning-theory  machine-learning  PAC  lower-bounds  no-go  language  linguistics  models  fall-2015 
8 weeks ago by nhaliday
Lateralization of brain function - Wikipedia
Language
Language functions such as grammar, vocabulary and literal meaning are typically lateralized to the left hemisphere, especially in right handed individuals.[3] While language production is left-lateralized in up to 90% of right-handers, it is more bilateral, or even right-lateralized, in approximately 50% of left-handers.[4]

Broca's area and Wernicke's area, two areas associated with the production of speech, are located in the left cerebral hemisphere for about 95% of right-handers, but about 70% of left-handers.[5]:69

Auditory and visual processing
The processing of visual and auditory stimuli, spatial manipulation, facial perception, and artistic ability are represented bilaterally.[4] Numerical estimation, comparison and online calculation depend on bilateral parietal regions[6][7] while exact calculation and fact retrieval are associated with left parietal regions, perhaps due to their ties to linguistic processing.[6][7]

...

Depression is linked with a hyperactive right hemisphere, with evidence of selective involvement in "processing negative emotions, pessimistic thoughts and unconstructive thinking styles", as well as vigilance, arousal and self-reflection, and a relatively hypoactive left hemisphere, "specifically involved in processing pleasurable experiences" and "relatively more involved in decision-making processes".

Chaos and Order; the right and left hemispheres: https://orthosphere.wordpress.com/2018/05/23/chaos-and-order-the-right-and-left-hemispheres/
In The Master and His Emissary, Iain McGilchrist writes that a creature like a bird needs two types of consciousness simultaneously. It needs to be able to focus on something specific, such as pecking at food, while it also needs to keep an eye out for predators which requires a more general awareness of environment.

These are quite different activities. The Left Hemisphere (LH) is adapted for a narrow focus. The Right Hemisphere (RH) for the broad. The brains of human beings have the same division of function.

The LH governs the right side of the body, the RH, the left side. With birds, the left eye (RH) looks for predators, the right eye (LH) focuses on food and specifics. Since danger can take many forms and is unpredictable, the RH has to be very open-minded.

The LH is for narrow focus, the explicit, the familiar, the literal, tools, mechanism/machines and the man-made. The broad focus of the RH is necessarily more vague and intuitive and handles the anomalous, novel, metaphorical, the living and organic. The LH is high resolution but narrow, the RH low resolution but broad.

The LH exhibits unrealistic optimism and self-belief. The RH has a tendency towards depression and is much more realistic about a person’s own abilities. LH has trouble following narratives because it has a poor sense of “wholes.” In art it favors flatness, abstract and conceptual art, black and white rather than color, simple geometric shapes and multiple perspectives all shoved together, e.g., cubism. Particularly RH paintings emphasize vistas with great depth of field and thus space and time,[1] emotion, figurative painting and scenes related to the life world. In music, LH likes simple, repetitive rhythms. The RH favors melody, harmony and complex rhythms.

...

Schizophrenia is a disease of extreme LH emphasis. Since empathy is RH and the ability to notice emotional nuance facially, vocally and bodily expressed, schizophrenics tend to be paranoid and are often convinced that the real people they know have been replaced by robotic imposters. This is at least partly because they lose the ability to intuit what other people are thinking and feeling – hence they seem robotic and suspicious.

Oswald Spengler’s The Decline of the West as well as McGilchrist characterize the West as awash in phenomena associated with an extreme LH emphasis. Spengler argues that Western civilization was originally much more RH (to use McGilchrist’s categories) and that all its most significant artistic (in the broadest sense) achievements were triumphs of RH accentuation.

The RH is where novel experiences and the anomalous are processed and where mathematical, and other, problems are solved. The RH is involved with the natural, the unfamiliar, the unique, emotions, the embodied, music, humor, understanding intonation and emotional nuance of speech, the metaphorical, nuance, and social relations. It has very little speech, but the RH is necessary for processing all the nonlinguistic aspects of speaking, including body language. Understanding what someone means by vocal inflection and facial expressions is an intuitive RH process rather than explicit.

...

RH is very much the center of lived experience; of the life world with all its depth and richness. The RH is “the master” from the title of McGilchrist’s book. The LH ought to be no more than the emissary; the valued servant of the RH. However, in the last few centuries, the LH, which has tyrannical tendencies, has tried to become the master. The LH is where the ego is predominantly located. In split brain patients where the LH and the RH are surgically divided (this is done sometimes in the case of epileptic patients) one hand will sometimes fight with the other. In one man’s case, one hand would reach out to hug his wife while the other pushed her away. One hand reached for one shirt, the other another shirt. Or a patient will be driving a car and one hand will try to turn the steering wheel in the opposite direction. In these cases, the “naughty” hand is usually the left hand (RH), while the patient tends to identify herself with the right hand governed by the LH. The two hemispheres have quite different personalities.

The connection between LH and ego can also be seen in the fact that the LH is competitive, contentious, and agonistic. It wants to win. It is the part of you that hates to lose arguments.

Using the metaphor of Chaos and Order, the RH deals with Chaos – the unknown, the unfamiliar, the implicit, the emotional, the dark, danger, mystery. The LH is connected with Order – the known, the familiar, the rule-driven, the explicit, and light of day. Learning something means to take something unfamiliar and making it familiar. Since the RH deals with the novel, it is the problem-solving part. Once understood, the results are dealt with by the LH. When learning a new piece on the piano, the RH is involved. Once mastered, the result becomes a LH affair. The muscle memory developed by repetition is processed by the LH. If errors are made, the activity returns to the RH to figure out what went wrong; the activity is repeated until the correct muscle memory is developed in which case it becomes part of the familiar LH.

Science is an attempt to find Order. It would not be necessary if people lived in an entirely orderly, explicit, known world. The lived context of science implies Chaos. Theories are reductive and simplifying and help to pick out salient features of a phenomenon. They are always partial truths, though some are more partial than others. The alternative to a certain level of reductionism or partialness would be to simply reproduce the world which of course would be both impossible and unproductive. The test for whether a theory is sufficiently non-partial is whether it is fit for purpose and whether it contributes to human flourishing.

...

Analytic philosophers pride themselves on trying to do away with vagueness. To do so, they tend to jettison context which cannot be brought into fine focus. However, in order to understand things and discern their meaning, it is necessary to have the big picture, the overview, as well as the details. There is no point in having details if the subject does not know what they are details of. Such philosophers also tend to leave themselves out of the picture even when what they are thinking about has reflexive implications. John Locke, for instance, tried to banish the RH from reality. All phenomena having to do with subjective experience he deemed unreal and once remarked about metaphors, a RH phenomenon, that they are “perfect cheats.” Analytic philosophers tend to check the logic of the words on the page and not to think about what those words might say about them. The trick is for them to recognize that they and their theories, which exist in minds, are part of reality too.

The RH test for whether someone actually believes something can be found by examining his actions. If he finds that he must regard his own actions as free, and, in order to get along with other people, must also attribute free will to them and treat them as free agents, then he effectively believes in free will – no matter his LH theoretical commitments.

...

We do not know the origin of life. We do not know how or even if consciousness can emerge from matter. We do not know the nature of 96% of the matter of the universe. Clearly all these things exist. They can provide the subject matter of theories but they continue to exist as theorizing ceases or theories change. Not knowing how something is possible is irrelevant to its actual existence. An inability to explain something is ultimately neither here nor there.

If thought begins and ends with the LH, then thinking has no content – content being provided by experience (RH), and skepticism and nihilism ensue. The LH spins its wheels self-referentially, never referring back to experience. Theory assumes such primacy that it will simply outlaw experiences and data inconsistent with it; a profoundly wrong-headed approach.

...

Gödel’s Theorem proves that not everything true can be proven to be true. This means there is an ineradicable role for faith, hope and intuition in every moderately complex human intellectual endeavor. There is no one set of consistent axioms from which all other truths can be derived.

Alan Turing’s proof of the halting problem proves that there is no effective procedure for finding effective procedures. Without a mechanical decision procedure, (LH), when it comes to … [more]
gnon  reflection  books  summary  review  neuro  neuro-nitgrit  things  thinking  metabuch  order-disorder  apollonian-dionysian  bio  examples  near-far  symmetry  homo-hetero  logic  inference  intuition  problem-solving  analytical-holistic  n-factor  europe  the-great-west-whale  occident  alien-character  detail-architecture  art  theory-practice  philosophy  being-becoming  essence-existence  language  psychology  cog-psych  egalitarianism-hierarchy  direction  reason  learning  novelty  science  anglo  anglosphere  coarse-fine  neurons  truth  contradiction  matching  empirical  volo-avolo  curiosity  uncertainty  theos  axioms  intricacy  computation  analogy  essay  rhetoric  deep-materialism  new-religion  knowledge  expert-experience  confidence  biases  optimism  pessimism  realness  whole-partial-many  theory-of-mind  values  competition  reduction  subjective-objective  communication  telos-atelos  ends-means  turing  fiction  increase-decrease  innovation  creative  thick-thin  spengler  multi  ratty  hanson  complex-systems  structure  concrete  abstraction  network-s 
september 2018 by nhaliday
The Hanson-Yudkowsky AI-Foom Debate - Machine Intelligence Research Institute
How Deviant Recent AI Progress Lumpiness?: http://www.overcomingbias.com/2018/03/how-deviant-recent-ai-progress-lumpiness.html
I seem to disagree with most people working on artificial intelligence (AI) risk. While with them I expect rapid change once AI is powerful enough to replace most all human workers, I expect this change to be spread across the world, not concentrated in one main localized AI system. The efforts of AI risk folks to design AI systems whose values won’t drift might stop global AI value drift if there is just one main AI system. But doing so in a world of many AI systems at similar abilities levels requires strong global governance of AI systems, which is a tall order anytime soon. Their continued focus on preventing single system drift suggests that they expect a single main AI system.

The main reason that I understand to expect relatively local AI progress is if AI progress is unusually lumpy, i.e., arriving in unusually fewer larger packages rather than in the usual many smaller packages. If one AI team finds a big lump, it might jump way ahead of the other teams.

However, we have a vast literature on the lumpiness of research and innovation more generally, which clearly says that usually most of the value in innovation is found in many small innovations. We have also so far seen this in computer science (CS) and AI. Even if there have been historical examples where much value was found in particular big innovations, such as nuclear weapons or the origin of humans.

Apparently many people associated with AI risk, including the star machine learning (ML) researchers that they often idolize, find it intuitively plausible that AI and ML progress is exceptionally lumpy. Such researchers often say, “My project is ‘huge’, and will soon do it all!” A decade ago my ex-co-blogger Eliezer Yudkowsky and I argued here on this blog about our differing estimates of AI progress lumpiness. He recently offered Alpha Go Zero as evidence of AI lumpiness:

...

In this post, let me give another example (beyond two big lumps in a row) of what could change my mind. I offer a clear observable indicator, for which data should have available now: deviant citation lumpiness in recent ML research. One standard measure of research impact is citations; bigger lumpier developments gain more citations that smaller ones. And it turns out that the lumpiness of citations is remarkably constant across research fields! See this March 3 paper in Science:

I Still Don’t Get Foom: http://www.overcomingbias.com/2014/07/30855.html
All of which makes it look like I’m the one with the problem; everyone else gets it. Even so, I’m gonna try to explain my problem again, in the hope that someone can explain where I’m going wrong. Here goes.

“Intelligence” just means an ability to do mental/calculation tasks, averaged over many tasks. I’ve always found it plausible that machines will continue to do more kinds of mental tasks better, and eventually be better at pretty much all of them. But what I’ve found it hard to accept is a “local explosion.” This is where a single machine, built by a single project using only a tiny fraction of world resources, goes in a short time (e.g., weeks) from being so weak that it is usually beat by a single human with the usual tools, to so powerful that it easily takes over the entire world. Yes, smarter machines may greatly increase overall economic growth rates, and yes such growth may be uneven. But this degree of unevenness seems implausibly extreme. Let me explain.

If we count by economic value, humans now do most of the mental tasks worth doing. Evolution has given us a brain chock-full of useful well-honed modules. And the fact that most mental tasks require the use of many modules is enough to explain why some of us are smarter than others. (There’d be a common “g” factor in task performance even with independent module variation.) Our modules aren’t that different from those of other primates, but because ours are different enough to allow lots of cultural transmission of innovation, we’ve out-competed other primates handily.

We’ve had computers for over seventy years, and have slowly build up libraries of software modules for them. Like brains, computers do mental tasks by combining modules. An important mental task is software innovation: improving these modules, adding new ones, and finding new ways to combine them. Ideas for new modules are sometimes inspired by the modules we see in our brains. When an innovation team finds an improvement, they usually sell access to it, which gives them resources for new projects, and lets others take advantage of their innovation.

...

In Bostrom’s graph above the line for an initially small project and system has a much higher slope, which means that it becomes in a short time vastly better at software innovation. Better than the entire rest of the world put together. And my key question is: how could it plausibly do that? Since the rest of the world is already trying the best it can to usefully innovate, and to abstract to promote such innovation, what exactly gives one small project such a huge advantage to let it innovate so much faster?

...

In fact, most software innovation seems to be driven by hardware advances, instead of innovator creativity. Apparently, good ideas are available but must usually wait until hardware is cheap enough to support them.

Yes, sometimes architectural choices have wider impacts. But I was an artificial intelligence researcher for nine years, ending twenty years ago, and I never saw an architecture choice make a huge difference, relative to other reasonable architecture choices. For most big systems, overall architecture matters a lot less than getting lots of detail right. Researchers have long wandered the space of architectures, mostly rediscovering variations on what others found before.

Some hope that a small project could be much better at innovation because it specializes in that topic, and much better understands new theoretical insights into the basic nature of innovation or intelligence. But I don’t think those are actually topics where one can usefully specialize much, or where we’ll find much useful new theory. To be much better at learning, the project would instead have to be much better at hundreds of specific kinds of learning. Which is very hard to do in a small project.

What does Bostrom say? Alas, not much. He distinguishes several advantages of digital over human minds, but all software shares those advantages. Bostrom also distinguishes five paths: better software, brain emulation (i.e., ems), biological enhancement of humans, brain-computer interfaces, and better human organizations. He doesn’t think interfaces would work, and sees organizations and better biology as only playing supporting roles.

...

Similarly, while you might imagine someday standing in awe in front of a super intelligence that embodies all the power of a new age, superintelligence just isn’t the sort of thing that one project could invent. As “intelligence” is just the name we give to being better at many mental tasks by using many good mental modules, there’s no one place to improve it. So I can’t see a plausible way one project could increase its intelligence vastly faster than could the rest of the world.

Takeoff speeds: https://sideways-view.com/2018/02/24/takeoff-speeds/
Futurists have argued for years about whether the development of AGI will look more like a breakthrough within a small group (“fast takeoff”), or a continuous acceleration distributed across the broader economy or a large firm (“slow takeoff”).

I currently think a slow takeoff is significantly more likely. This post explains some of my reasoning and why I think it matters. Mostly the post lists arguments I often hear for a fast takeoff and explains why I don’t find them compelling.

(Note: this is not a post about whether an intelligence explosion will occur. That seems very likely to me. Quantitatively I expect it to go along these lines. So e.g. while I disagree with many of the claims and assumptions in Intelligence Explosion Microeconomics, I don’t disagree with the central thesis or with most of the arguments.)
ratty  lesswrong  subculture  miri-cfar  ai  risk  ai-control  futurism  books  debate  hanson  big-yud  prediction  contrarianism  singularity  local-global  speed  speedometer  time  frontier  distribution  smoothness  shift  pdf  economics  track-record  abstraction  analogy  links  wiki  list  evolution  mutation  selection  optimization  search  iteration-recursion  intelligence  metameta  chart  analysis  number  ems  coordination  cooperate-defect  death  values  formal-values  flux-stasis  philosophy  farmers-and-foragers  malthus  scale  studying  innovation  insight  conceptual-vocab  growth-econ  egalitarianism-hierarchy  inequality  authoritarianism  wealth  near-far  rationality  epistemic  biases  cycles  competition  arms  zero-positive-sum  deterrence  war  peace-violence  winner-take-all  technology  moloch  multi  plots  research  science  publishing  humanity  labor  marginal  urban-rural  structure  composition-decomposition  complex-systems  gregory-clark  decentralized  heavy-industry  magnitude  multiplicative  endogenous-exogenous  models  uncertainty  decision-theory  time-prefer 
april 2018 by nhaliday
AI-complete - Wikipedia
In the field of artificial intelligence, the most difficult problems are informally known as AI-complete or AI-hard, implying that the difficulty of these computational problems is equivalent to that of solving the central artificial intelligence problem—making computers as intelligent as people, or strong AI.[1] To call a problem AI-complete reflects an attitude that it would not be solved by a simple specific algorithm.

AI-complete problems are hypothesised to include computer vision, natural language understanding, and dealing with unexpected circumstances while solving any real world problem.[2]

Currently, AI-complete problems cannot be solved with modern computer technology alone, but would also require human computation. This property can be useful, for instance to test for the presence of humans as with CAPTCHAs, and for computer security to circumvent brute-force attacks.[3][4]

...

AI-complete problems are hypothesised to include:

Bongard problems
Computer vision (and subproblems such as object recognition)
Natural language understanding (and subproblems such as text mining, machine translation, and word sense disambiguation[8])
Dealing with unexpected circumstances while solving any real world problem, whether it's navigation or planning or even the kind of reasoning done by expert systems.

...

Current AI systems can solve very simple and/or restricted versions of AI-complete problems, but never in their full generality. When AI researchers attempt to "scale up" their systems to handle more complicated, real world situations, the programs tend to become excessively brittle without commonsense knowledge or a rudimentary understanding of the situation: they fail as unexpected circumstances outside of its original problem context begin to appear. When human beings are dealing with new situations in the world, they are helped immensely by the fact that they know what to expect: they know what all things around them are, why they are there, what they are likely to do and so on. They can recognize unusual situations and adjust accordingly. A machine without strong AI has no other skills to fall back on.[9]
concept  reduction  cs  computation  complexity  wiki  reference  properties  computer-vision  ai  risk  ai-control  machine-learning  deep-learning  language  nlp  order-disorder  tactics  strategy  intelligence  humanity  speculation  crux 
march 2018 by nhaliday
Information Processing: Mathematical Theory of Deep Neural Networks (Princeton workshop)
"Recently, long-past-due theoretical results have begun to emerge. These results, and those that will follow in their wake, will begin to shed light on the properties of large, adaptive, distributed learning architectures, and stand to revolutionize how computer science and neuroscience understand these systems."
hsu  scitariat  commentary  links  research  research-program  workshop  events  princeton  sanjeev-arora  deep-learning  machine-learning  ai  generalization  explanans  off-convex  nibble  frontier  speedometer  state-of-art  big-surf  announcement 
january 2018 by nhaliday
Sequence Modeling with CTC
A visual guide to Connectionist Temporal Classification, an algorithm used to train deep neural networks in speech recognition, handwriting recognition and other sequence problems.
acmtariat  techtariat  org:bleg  nibble  better-explained  machine-learning  deep-learning  visual-understanding  visualization  analysis  let-me-see  research  sequential  audio  classification  model-class  exposition  language  acm  approximation  comparison  markov  iteration-recursion  concept  atoms  distribution  orders  DP  heuristic  optimization  trees  greedy  matching  gradient-descent 
december 2017 by nhaliday
[1709.06560] Deep Reinforcement Learning that Matters
https://twitter.com/WAWilsonIV/status/912505885565452288
I’ve been experimenting w/ various kinds of value function approaches to RL lately, and its striking how primitive and bad things seem to be
At first I thought it was just that my code sucks, but then I played with the OpenAI baselines and nope, it’s the children that are wrong.
And now, what comes across my desk but this fantastic paper: (link: https://arxiv.org/abs/1709.06560) arxiv.org/abs/1709.06560 How long until the replication crisis hits AI?

https://twitter.com/WAWilsonIV/status/911318326504153088
Seriously I’m not blown away by the PhDs’ records over the last 30 years. I bet you’d get better payoff funding eccentrics and amateurs.
There are essentially zero fundamentally new ideas in AI, the papers are all grotesquely hyperparameter tuned, nobody knows why it works.

Deep Reinforcement Learning Doesn't Work Yet: https://www.alexirpan.com/2018/02/14/rl-hard.html
Once, on Facebook, I made the following claim.

Whenever someone asks me if reinforcement learning can solve their problem, I tell them it can’t. I think this is right at least 70% of the time.
papers  preprint  machine-learning  acm  frontier  speedometer  deep-learning  realness  replication  state-of-art  survey  reinforcement  multi  twitter  social  discussion  techtariat  ai  nibble  org:mat  unaffiliated  ratty  acmtariat  liner-notes  critique  sample-complexity  cost-benefit  todo 
september 2017 by nhaliday
New Theory Cracks Open the Black Box of Deep Learning | Quanta Magazine
A new idea called the “information bottleneck” is helping to explain the puzzling success of today’s artificial-intelligence algorithms — and might also explain how human brains learn.

sounds like he's just talking about autoencoders?
news  org:mag  org:sci  popsci  announcement  research  deep-learning  machine-learning  acm  information-theory  bits  neuro  model-class  big-surf  frontier  nibble  hmm  signal-noise  deepgoog  expert  ideas  wild-ideas  summary  talks  video  israel  roots  physics  interdisciplinary  ai  intelligence  shannon  giants  arrows  preimage  lifts-projections  composition-decomposition  characterization  markov  gradient-descent  papers  liner-notes  experiment  hi-order-bits  generalization  expert-experience  explanans  org:inst  speedometer 
september 2017 by nhaliday
Rank aggregation basics: Local Kemeny optimisation | David R. MacIver
This turns our problem from a global search to a local one: Basically we can start from any point in the search space and search locally by swapping adjacent pairs until we hit a minimum. This turns out to be quite easy to do. _We basically run insertion sort_: At step n we have the first n items in a locally Kemeny optimal order. Swap the n+1th item backwards until the majority think its predecessor is < it. This ensures all adjacent pairs are in the majority order, so swapping them would result in a greater than or equal K. This is of course an O(n^2) algorithm. In fact, the problem of merely finding a locally Kemeny optimal solution can be done in O(n log(n)) (for much the same reason as you can sort better than insertion sort). You just take the directed graph of majority votes and find a Hamiltonian Path. The nice thing about the above version of the algorithm is that it gives you a lot of control over where you start your search.
techtariat  liner-notes  papers  tcs  algorithms  machine-learning  acm  optimization  approximation  local-global  orders  graphs  graph-theory  explanation  iteration-recursion  time-complexity  nibble 
september 2017 by nhaliday
Accurate Genomic Prediction Of Human Height | bioRxiv
Stephen Hsu's compressed sensing application paper

We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ~40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ~0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction.

https://infoproc.blogspot.com/2017/09/accurate-genomic-prediction-of-human.html

http://infoproc.blogspot.com/2017/11/23andme.html
I'm in Mountain View to give a talk at 23andMe. Their latest funding round was $250M on a (reported) valuation of $1.5B. If I just add up the Crunchbase numbers it looks like almost half a billion invested at this point...

Slides: Genomic Prediction of Complex Traits

Here's how people + robots handle your spit sample to produce a SNP genotype:

https://drive.google.com/file/d/1e_zuIPJr1hgQupYAxkcbgEVxmrDHAYRj/view
study  bio  preprint  GWAS  state-of-art  embodied  genetics  genomics  compressed-sensing  high-dimension  machine-learning  missing-heritability  hsu  scitariat  education  🌞  frontier  britain  regression  data  visualization  correlation  phase-transition  multi  commentary  summary  pdf  slides  brands  skunkworks  hard-tech  presentation  talks  methodology  intricacy  bioinformatics  scaling-up  stat-power  sparsity  norms  nibble  speedometer  stats  linear-models  2017  biodet 
september 2017 by nhaliday
Stat 260/CS 294: Bayesian Modeling and Inference
Topics
- Priors (conjugate, noninformative, reference)
- Hierarchical models, spatial models, longitudinal models, dynamic models, survival models
- Testing
- Model choice
- Inference (importance sampling, MCMC, sequential Monte Carlo)
- Nonparametric models (Dirichlet processes, Gaussian processes, neutral-to-the-right processes, completely random measures)
- Decision theory and frequentist perspectives (complete class theorems, consistency, empirical Bayes)
- Experimental design
unit  course  berkeley  expert  michael-jordan  machine-learning  acm  bayesian  probability  stats  lecture-notes  priors-posteriors  markov  monte-carlo  frequentist  latent-variables  decision-theory  expert-experience  confidence  sampling 
july 2017 by nhaliday
Superintelligence Risk Project Update II
https://www.jefftk.com/p/superintelligence-risk-project-update

https://www.jefftk.com/p/conversation-with-michael-littman
For example, I asked him what he thought of the idea that to we could get AGI with current techniques, primarily deep neural nets and reinforcement learning, without learning anything new about how intelligence works or how to implement it ("Prosaic AGI" [1]). He didn't think this was possible, and believes there are deep conceptual issues we still need to get a handle on. He's also less impressed with deep learning than he was before he started working in it: in his experience it's a much more brittle technology than he had been expecting. Specifically, when trying to replicate results, he's often found that they depend on a bunch of parameters being in just the right range, and without that the systems don't perform nearly as well.

The bottom line, to him, was that since we are still many breakthroughs away from getting to AGI, we can't productively work on reducing superintelligence risk now.

He told me that he worries that the AI risk community is not solving real problems: they're making deductions and inferences that are self-consistent but not being tested or verified in the world. Since we can't tell if that's progress, it probably isn't. I asked if he was referring to MIRI's work here, and he said their work was an example of the kind of approach he's skeptical about, though he wasn't trying to single them out. [2]

https://www.jefftk.com/p/conversation-with-an-ai-researcher
Earlier this week I had a conversation with an AI researcher [1] at one of the main industry labs as part of my project of assessing superintelligence risk. Here's what I got from them:

They see progress in ML as almost entirely constrained by hardware and data, to the point that if today's hardware and data had existed in the mid 1950s researchers would have gotten to approximately our current state within ten to twenty years. They gave the example of backprop: we saw how to train multi-layer neural nets decades before we had the computing power to actually train these nets to do useful things.

Similarly, people talk about AlphaGo as a big jump, where Go went from being "ten years away" to "done" within a couple years, but they said it wasn't like that. If Go work had stayed in academia, with academia-level budgets and resources, it probably would have taken nearly that long. What changed was a company seeing promising results, realizing what could be done, and putting way more engineers and hardware on the project than anyone had previously done. AlphaGo couldn't have happened earlier because the hardware wasn't there yet, and was only able to be brought forward by massive application of resources.

https://www.jefftk.com/p/superintelligence-risk-project-conclusion
Summary: I'm not convinced that AI risk should be highly prioritized, but I'm also not convinced that it shouldn't. Highly qualified researchers in a position to have a good sense the field have massively different views on core questions like how capable ML systems are now, how capable they will be soon, and how we can influence their development. I do think these questions are possible to get a better handle on, but I think this would require much deeper ML knowledge than I have.
ratty  core-rats  ai  risk  ai-control  prediction  expert  machine-learning  deep-learning  speedometer  links  research  research-program  frontier  multi  interview  deepgoog  games  hardware  performance  roots  impetus  chart  big-picture  state-of-art  reinforcement  futurism  🤖  🖥  expert-experience  singularity  miri-cfar  empirical  evidence-based  speculation  volo-avolo  clever-rats  acmtariat  robust  ideas  crux  atoms  detail-architecture  software  gradient-descent 
july 2017 by nhaliday
Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient? | Scientific Reports
As machine learning/artificial intelligence algorithms are defeating chess masters and, most recently, GO champions, there is interest – and hope – that they will prove equally useful in assisting chemists in predicting outcomes of organic reactions. This paper demonstrates, however, that the applicability of machine learning to the problems of chemical reactivity over diverse types of chemistries remains limited – in particular, with the currently available chemical descriptors, fundamental mathematical theorems impose upper bounds on the accuracy with which raction yields and times can be predicted. Improving the performance of machine-learning methods calls for the development of fundamentally new chemical descriptors.
study  org:nat  papers  machine-learning  chemistry  measurement  volo-avolo  lower-bounds  analysis  realness  speedometer  nibble  🔬  applications  frontier  state-of-art  no-go  accuracy  interdisciplinary 
july 2017 by nhaliday
How to Escape Saddle Points Efficiently – Off the convex path
A core, emerging problem in nonconvex optimization involves the escape of saddle points. While recent research has shown that gradient descent (GD) generically escapes saddle points asymptotically (see Rong Ge’s and Ben Recht’s blog posts), the critical open problem is one of efficiency — is GD able to move past saddle points quickly, or can it be slowed down significantly? How does the rate of escape scale with the ambient dimensionality? In this post, we describe our recent work with Rong Ge, Praneeth Netrapalli and Sham Kakade, that provides the first provable positive answer to the efficiency question, showing that, rather surprisingly, GD augmented with suitable perturbations escapes saddle points efficiently; indeed, in terms of rate and dimension dependence it is almost as if the saddle points aren’t there!
acmtariat  org:bleg  nibble  liner-notes  machine-learning  acm  optimization  gradient-descent  local-global  off-convex  time-complexity  random  perturbation  michael-jordan  iterative-methods  research  learning-theory  math.DS  iteration-recursion 
july 2017 by nhaliday
Unsupervised learning, one notion or many? – Off the convex path
(Task A) Learning a distribution from samples. (Examples: gaussian mixtures, topic models, variational autoencoders,..)

(Task B) Understanding latent structure in the data. This is not the same as (a); for example principal component analysis, clustering, manifold learning etc. identify latent structure but don’t learn a distribution per se.

(Task C) Feature Learning. Learn a mapping from datapoint → feature vector such that classification tasks are easier to carry out on feature vectors rather than datapoints. For example, unsupervised feature learning could help lower the amount of labeled samples needed for learning a classifier, or be useful for domain adaptation.

Task B is often a subcase of Task C, as the intended user of “structure found in data” are humans (scientists) who pour over the representation of data to gain some intuition about its properties, and these “properties” can be often phrased as a classification task.

This post explains the relationship between Tasks A and C, and why they get mixed up in students’ mind. We hope there is also some food for thought here for experts, namely, our discussion about the fragility of the usual “perplexity” definition of unsupervised learning. It explains why Task A doesn’t in practice lead to good enough solution for Task C. For example, it has been believed for many years that for deep learning, unsupervised pretraining should help supervised training, but this has been hard to show in practice.
acmtariat  org:bleg  nibble  machine-learning  acm  thinking  clarity  unsupervised  conceptual-vocab  concept  explanation  features  bayesian  off-convex  deep-learning  latent-variables  generative  intricacy  distribution  sampling 
june 2017 by nhaliday
Edge.org: 2017 : WHAT SCIENTIFIC TERM OR CONCEPT OUGHT TO BE MORE WIDELY KNOWN?
highlights:
- the genetic book of the dead [Dawkins]
- complementarity [Frank Wilczek]
- relative information
- effective theory [Lisa Randall]
- affordances [Dennett]
- spontaneous symmetry breaking
- relatedly, equipoise [Nicholas Christakis]
- case-based reasoning
- population reasoning (eg, common law)
- criticality [Cesar Hidalgo]
- Haldan's law of the right size (!SCALE!)
- polygenic scores
- non-ergodic
- ansatz
- state [Aaronson]: http://www.scottaaronson.com/blog/?p=3075
- transfer learning
- effect size
- satisficing
- scaling
- the breeder's equation [Greg Cochran]
- impedance matching

soft:
- reciprocal altruism
- life history [Plomin]
- intellectual honesty [Sam Harris]
- coalitional instinct (interesting claim: building coalitions around "rationality" actually makes it more difficult to update on new evidence as it makes you look like a bad person, eg, the Cathedral)
basically same: https://twitter.com/ortoiseortoise/status/903682354367143936

more: https://www.edge.org/conversation/john_tooby-coalitional-instincts

interesting timing. how woke is this dude?
org:edge  2017  technology  discussion  trends  list  expert  science  top-n  frontier  multi  big-picture  links  the-world-is-just-atoms  metameta  🔬  scitariat  conceptual-vocab  coalitions  q-n-a  psychology  social-psych  anthropology  instinct  coordination  duty  power  status  info-dynamics  cultural-dynamics  being-right  realness  cooperate-defect  westminster  chart  zeitgeist  rot  roots  epistemic  rationality  meta:science  analogy  physics  electromag  geoengineering  environment  atmosphere  climate-change  waves  information-theory  bits  marginal  quantum  metabuch  homo-hetero  thinking  sapiens  genetics  genomics  evolution  bio  GT-101  low-hanging  minimum-viable  dennett  philosophy  cog-psych  neurons  symmetry  humility  life-history  social-structure  GWAS  behavioral-gen  biodet  missing-heritability  ergodic  machine-learning  generalization  west-hunter  population-genetics  methodology  blowhards  spearhead  group-level  scale  magnitude  business  scaling-tech  tech  business-models  optimization  effect-size  aaronson  state  bare-hands  problem-solving  politics 
may 2017 by nhaliday
« earlier      
per page:    204080120160

bundles : academeacmframetechie

related tags

2016-election  80000-hours  :/  aaronson  absolute-relative  abstraction  academia  accretion  accuracy  acm  acmtariat  adversarial  advice  aggregator  agriculture  ai  ai-control  akrasia  algebra  algorithmic-econ  algorithms  alien-character  alignment  allodium  altruism  ama  amazon  analogy  analysis  analytical-holistic  anglo  anglosphere  ankur-moitra  announcement  anonymity  anthropology  antidemos  aphorism  api  apollonian-dionysian  apple  applicability-prereqs  applications  approximation  arbitrage  aristos  arms  arrows  art  article  asia  atmosphere  atoms  attention  audio  authoritarianism  autism  auto-learning  automata  automation  average-case  axelrod  axioms  backup  baez  bandits  bangbang  bare-hands  barons  bayesian  behavioral-gen  being-becoming  being-right  ben-recht  benchmarks  benevolence  berkeley  best-practices  better-explained  bias-variance  biases  big-peeps  big-picture  big-surf  big-yud  bio  biodet  bioinformatics  biotech  bits  blog  blowhards  boltzmann  bonferroni  books  bostrom  bots  brain-scan  brands  britain  business  business-models  c(pp)  c:***  caching  calculation  california  caltech  canada  cancer  canon  capital  capitalism  career  carmack  cartoons  causation  characterization  chart  cheatsheet  checking  checklists  chemistry  chicago  china  christianity  circuits  civil-liberty  clarity  class  classic  classification  clever-rats  climate-change  closure  cloud  cmu  coalitions  coarse-fine  cog-psych  cohesion  cold-war  collaboration  columbia  combo-optimization  comics  commentary  communication  community  comparison  compensation  competition  complement-substitute  complex-systems  complexity  composition-decomposition  compressed-sensing  computation  computer-vision  concentration-of-measure  concept  conceptual-vocab  concrete  concurrency  conference  confidence  confluence  confounding  confusion  consilience  constraint-satisfaction  context  contradiction  contrarianism  convergence  convexity-curvature  cool  cooperate-defect  coordination  core-rats  cornell  correlation  cost-benefit  counter-revolution  counterexample  counterfactual  courage  course  creative  crime  criminal-justice  critique  crooked  crux  crypto  crypto-anarchy  cs  cultural-dynamics  culture  culture-war  curiosity  curvature  cybernetics  cycles  cynicism-idealism  d3  dark-arts  darwinian  data  data-science  database  dataset  dataviz  dbs  death  debate  debt  debugging  decentralized  decision-making  decision-theory  deep-learning  deep-materialism  deepgoog  defense  definite-planning  definition  degrees-of-freedom  democracy  dennett  descriptive  detail-architecture  deterrence  developmental  differential  differential-privacy  dimensionality  diogenes  direct-indirect  direction  dirty-hands  discovery  discrete  discussion  disease  distributed  distribution  distributional  diversity  documentation  DP  draft  driving  drugs  duality  dumb-ML  duplication  duty  dynamic  dynamical  early-modern  econometrics  economics  eden  eden-heaven  education  EEA  effect-size  effective-altruism  efficiency  egalitarianism-hierarchy  EGT  einstein  electromag  elite  embedded-cognition  embeddings  embodied  emergent  empirical  ems  encyclopedic  endogenous-exogenous  ends-means  energy-resources  engineering  enhancement  ensembles  entrepreneurialism  entropy-like  environment  envy  epistemic  equilibrium  ergodic  error  essay  essence-existence  estimate  ethical-algorithms  ethics  europe  events  evidence-based  evolution  evopsych  examples  existence  exocortex  experiment  expert  expert-experience  explanans  explanation  exploration-exploitation  exploratory  explore-exploit  exposition  extra-introversion  extratricky  extrema  facebook  fall-2015  fall-2016  farmers-and-foragers  fashun  FDA  features  feudal  fiction  finance  finiteness  fixed-point  flexibility  flux-stasis  focus  foreign-lang  formal-values  forum  fourier  frameworks  freelance  french  frequency  frequentist  frontier  futurism  gallic  game-theory  games  gaussian-processes  GCTA  gelman  generalization  generative  genetics  genomics  geoengineering  geography  geometry  georgia  germanic  giants  gnon  gnosis-logos  god-man-beast-victim  google  gotchas  government  grad-school  gradient-descent  graph-theory  graphical-models  graphics  graphs  greedy  gregory-clark  ground-up  group-level  growth  growth-econ  GT-101  guide  GWAS  gwern  hacker  hanson  hard-tech  hardness  hardware  harvard  heavy-industry  heterodox  heuristic  hi-order-bits  hidden-motives  high-dimension  high-variance  higher-ed  history  hmm  hn  homepage  homo-hetero  homogeneity  honor  howto  hsu  huge-data-the-biggest  human-ml  humanity  humility  hypocrisy  hypothesis-testing  icml  ideas  identity  identity-politics  ideology  idk  IEEE  iidness  illusion  impact  impetus  impro  incentives  increase-decrease  individualism-collectivism  inequality  inference  info-dynamics  info-foraging  infographic  information-theory  init  inner-product  innovation  insight  instinct  institutions  intel  intelligence  interdisciplinary  interests  internet  interpretability  interview  intricacy  intuition  investing  iq  iron-age  is-ought  isotropy  israel  iteration-recursion  iterative-methods  janus  japan  jargon  jobs  journos-pundits  justice  kaggle  kernels  knowledge  kumbaya-kult  labor  land  language  large-factor  latent-variables  latin-america  law  leadership  learning  learning-theory  lecture-notes  lectures  left-wing  legacy  legibility  len:long  len:short  lens  lesswrong  let-me-see  letters  levers  leviathan  libraries  life-history  lifts-projections  limits  linear-algebra  linear-models  linearity  liner-notes  linguistics  links  list  literature  local-global  logic  lol  long-short-run  long-term  longevity  longform  longitudinal  love-hate  low-hanging  lower-bounds  machine-learning  macro  magnitude  malaise  malthus  management  manifolds  map-territory  marginal  market-power  markets  markov  martingale  matching  math  math.CA  math.DS  math.GN  math.GR  math.MG  math.NT  math.RT  mathtariat  matrix-factorization  maxim-gun  measure  measurement  mechanics  mechanism-design  media  medicine  mediterranean  meta:math  meta:medicine  meta:prediction  meta:rhetoric  meta:science  metabuch  metameta  methodology  metric-space  metrics  michael-jordan  michael-nielsen  micro  microsoft  military  minimum-viable  miri-cfar  missing-heritability  mit  ML-MAP-E  mobile  model-class  model-organism  model-selection  models  moloch  moments  monetary-fiscal  money  money-for-time  monte-carlo  morality  mostly-modern  motivation  mrtz  msr  multi  multiplicative  music  music-theory  musk  mutation  mystic  myth  n-factor  narrative  nascent-state  nationalism-globalism  nature  near-far  network-structure  networking  neuro  neuro-nitgrit  neurons  new-religion  news  nibble  nietzschean  nips  nitty-gritty  nlp  no-go  noble-lie  noise-structure  nonlinearity  nonparametric  norms  northeast  novelty  nuclear  number  numerics  nutrition  nyc  objektbuch  occam  occident  ocr  ocw  off-convex  offense-defense  old-anglo  oly  online-learning  open-closed  open-problems  openai  operational  opsec  optimism  optimization  order-disorder  orders  org:anglo  org:biz  org:bleg  org:edge  org:edu  org:inst  org:junk  org:mag  org:mat  org:med  org:nat  org:ngo  org:rec  org:sci  organization  organizing  orient  oscillation  oss  outcome-risk  outliers  overflow  p:*  p:**  p:***  p:null  p:someday  p:whenever  PAC  papers  paradox  parallax  parametric  parenting  parsimony  paste  patience  pdf  peace-violence  people  performance  personality  perturbation  pessimism  phalanges  pharma  phase-transition  phd  philosophy  phys-energy  physics  pic  pigeonhole-markov  pinboard  pinker  piracy  planning  plots  pls  podcast  polanyi-marx  polarization  polisci  politics  polynomials  pop-structure  popsci  population-genetics  positivity  potential  power  power-law  ppl  pragmatic  pre-2013  pre-ww2  prediction  prediction-markets  predictive-processing  preference-falsification  preimage  preprint  presentation  primitivism  princeton  prioritizing  priors-posteriors  privacy  pro-rata  probability  problem-solving  prof  profile  programming  project  proof-systems  proofs  properties  proposal  protocol  prudence  pseudorandomness  psych-architecture  psychiatry  psychology  psychometrics  publishing  puzzles  python  q-n-a  qra  quantum  quantum-info  questions  quixotic  quora  quotes  rand-complexity  random  randy-ayndy  ranking  rant  rationality  ratty  reading  realness  reason  recommendations  recruiting  reddit  redistribution  reduction  reference  reflection  regression  regularization  regularizer  regulation  reinforcement  relativity  religion  rent-seeking  replication  repo  research  research-program  responsibility  retention  review  revolution  rhetoric  rhythm  rigidity  rigor  rigorous-crypto  risk  ritual  roadmap  robotics  robust  roots  rot  rounding  rust  s:*  s:***  saas  safety  sample-complexity  sampling  sanctity-degradation  sanjeev-arora  sapiens  scale  scaling-tech  scaling-up  schelling  scholar  scholar-pack  schools  science  scifi-fantasy  scitariat  search  sebastien-bubeck  securities  security  selection  seminar  sensitivity  separation  sequential  series  sex  shakespeare  shalizi  shannon  shift  shipping  SIGGRAPH  signal-noise  signaling  signum  similarity  simulation  singularity  sinosphere  skeleton  skunkworks  sleuthin  slides  slippery-slope  smoothness  social  social-choice  social-norms  social-psych  social-science  social-structure  socs-and-mops  soft-question  software  space  space-complexity  sparsity  spatial  spearhead  spectral  speculation  speed  speedometer  spengler  spock  stackex  stagnation  stanford  startups  stat-mech  stat-power  state  state-of-art  statesmen  stats  status  stereotypes  stochastic-processes  stock-flow  stories  strategy  straussian  stream  street-fighting  stress  structure  students  study  studying  stylized-facts  subculture  subjective-objective  submodular  success  summary  summer-2016  supply-demand  survey  sv  symmetry  synchrony  synthesis  system-design  systematic-ad-hoc  systems  tactics  tails  talks  tcs  tcstariat  teaching  tech  technocracy  technology  techtariat  telos-atelos  tensors  terminal  tetlock  the-basilisk  the-classics  the-devil  the-founding  the-great-west-whale  the-prices  the-self  the-trenches  the-watchers  the-west  the-world-is-just-atoms  theory-of-mind  theory-practice  theos  thermo  thesis  thick-thin  thiel  things  thinking  threat-modeling  tidbits  tightness  time  time-complexity  time-preference  time-series  todo  toolkit  tools  top-n  topics  topology  toxoplasmosis  track-record  trade  tradecraft  tradeoffs  transportation  trees  trends  tribalism  tricki  tricks  troll  trust  truth  turing  tutorial  twitter  ui  unaffiliated  uncertainty  unintended-consequences  unit  universalism-particularism  unsupervised  urban  urban-rural  us-them  usa  valiant  values  vc-dimension  venture  video  virginia-DC  visual-understanding  visualization  visuo  vitality  volo-avolo  war  washington  waves  wealth  welfare-state  west-hunter  westminster  whole-partial-many  wiki  wild-ideas  winner-take-all  wire-guided  wisdom  within-without  workflow  workshop  world-war  wormholes  worrydream  writing  X-not-about-Y  yak-shaving  yoga  zeitgeist  zero-positive-sum  zooming  🌞  🎓  👳  👽  🔬  🖥  🤖  🦉 

Copy this bookmark:



description:


tags: