nhaliday + acm + reinforcement   10

Prisoner's dilemma - Wikipedia
caveat to result below:
An extension of the IPD is an evolutionary stochastic IPD, in which the relative abundance of particular strategies is allowed to change, with more successful strategies relatively increasing. This process may be accomplished by having less successful players imitate the more successful strategies, or by eliminating less successful players from the game, while multiplying the more successful ones. It has been shown that unfair ZD strategies are not evolutionarily stable. The key intuition is that an evolutionarily stable strategy must not only be able to invade another population (which extortionary ZD strategies can do) but must also perform well against other players of the same type (which extortionary ZD players do poorly, because they reduce each other's surplus).[14]

Theory and simulations confirm that beyond a critical population size, ZD extortion loses out in evolutionary competition against more cooperative strategies, and as a result, the average payoff in the population increases when the population is bigger. In addition, there are some cases in which extortioners may even catalyze cooperation by helping to break out of a face-off between uniform defectors and win–stay, lose–switch agents.[8]

https://alfanl.com/2018/04/12/defection/
Nature boils down to a few simple concepts.

Haters will point out that I oversimplify. The haters are wrong. I am good at saying a lot with few words. Nature indeed boils down to a few simple concepts.

In life, you can either cooperate or defect.

Used to be that defection was the dominant strategy, say in the time when the Roman empire started to crumble. Everybody complained about everybody and in the end nothing got done. Then came Jesus, who told people to be loving and cooperative, and boom: 1800 years later we get the industrial revolution.

Because of Jesus we now find ourselves in a situation where cooperation is the dominant strategy. A normie engages in a ton of cooperation: with the tax collector who wants more and more of his money, with schools who want more and more of his kid’s time, with media who wants him to repeat more and more party lines, with the Zeitgeist of the Collective Spirit of the People’s Progress Towards a New Utopia. Essentially, our normie is cooperating himself into a crumbling Western empire.

Turns out that if everyone blindly cooperates, parasites sprout up like weeds until defection once again becomes the standard.

The point of a post-Christian religion is to once again create conditions for the kind of cooperation that led to the industrial revolution. This necessitates throwing out undead Christianity: you do not blindly cooperate. You cooperate with people that cooperate with you, you defect on people that defect on you. Christianity mixed with Darwinism. God and Gnon meet.

This also means we re-establish spiritual hierarchy, which, like regular hierarchy, is a prerequisite for cooperation. It is this hierarchical cooperation that turns a household into a force to be reckoned with, that allows a group of men to unite as a front against their enemies, that allows a tribe to conquer the world. Remember: Scientology bullied the Cathedral’s tax department into submission.

With a functioning hierarchy, men still gossip, lie and scheme, but they will do so in whispers behind closed doors. In your face they cooperate and contribute to the group’s wellbeing because incentives are thus that contributing to group wellbeing heightens status.

Without a functioning hierarchy, men gossip, lie and scheme, but they do so in your face, and they tell you that you are positively deluded for accusing them of gossiping, lying and scheming. Seeds will not sprout in such ground.

Spiritual dominance is established in the same way any sort of dominance is established: fought for, taken. But the fight is ritualistic. You can’t force spiritual dominance if no one listens, or if you are silenced the ritual is not allowed to happen.

If one of our priests is forbidden from establishing spiritual dominance, that is a sure sign an enemy priest is in better control and has vested interest in preventing you from establishing spiritual dominance..

They defect on you, you defect on them. Let them suffer the consequences of enemy priesthood, among others characterized by the annoying tendency that very little is said with very many words.

https://contingentnotarbitrary.com/2018/04/14/rederiving-christianity/
To recap, we started with a secular definition of Logos and noted that its telos is existence. Given human nature, game theory and the power of cooperation, the highest expression of that telos is freely chosen universal love, tempered by constant vigilance against defection while maintaining compassion for the defectors and forgiving those who repent. In addition, we must know the telos in order to fulfill it.

In Christian terms, looks like we got over half of the Ten Commandments (know Logos for the First, don’t defect or tempt yourself to defect for the rest), the importance of free will, the indestructibility of evil (group cooperation vs individual defection), loving the sinner and hating the sin (with defection as the sin), forgiveness (with conditions), and love and compassion toward all, assuming only secular knowledge and that it’s good to exist.

Iterated Prisoner's Dilemma is an Ultimatum Game: http://infoproc.blogspot.com/2012/07/iterated-prisoners-dilemma-is-ultimatum.html
The history of IPD shows that bounded cognition prevented the dominant strategies from being discovered for over over 60 years, despite significant attention from game theorists, computer scientists, economists, evolutionary biologists, etc. Press and Dyson have shown that IPD is effectively an ultimatum game, which is very different from the Tit for Tat stories told by generations of people who worked on IPD (Axelrod, Dawkins, etc., etc.).

...

For evolutionary biologists: Dyson clearly thinks this result has implications for multilevel (group vs individual selection):
... Cooperation loses and defection wins. The ZD strategies confirm this conclusion and make it sharper. ... The system evolved to give cooperative tribes an advantage over non-cooperative tribes, using punishment to give cooperation an evolutionary advantage within the tribe. This double selection of tribes and individuals goes way beyond the Prisoners' Dilemma model.

implications for fractionalized Europe vis-a-vis unified China?

and more broadly does this just imply we're doomed in the long run RE: cooperation, morality, the "good society", so on...? war and group-selection is the only way to get a non-crab bucket civilization?

Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent:
http://www.pnas.org/content/109/26/10409.full
http://www.pnas.org/content/109/26/10409.full.pdf
https://www.edge.org/conversation/william_h_press-freeman_dyson-on-iterated-prisoners-dilemma-contains-strategies-that

https://en.wikipedia.org/wiki/Ultimatum_game

analogy for ultimatum game: the state gives the demos a bargain take-it-or-leave-it, and...if the demos refuses...violence?

The nature of human altruism: http://sci-hub.tw/https://www.nature.com/articles/nature02043
- Ernst Fehr & Urs Fischbacher

Some of the most fundamental questions concerning our evolutionary origins, our social relations, and the organization of society are centred around issues of altruism and selfishness. Experimental evidence indicates that human altruism is a powerful force and is unique in the animal world. However, there is much individual heterogeneity and the interaction between altruists and selfish individuals is vital to human cooperation. Depending on the environment, a minority of altruists can force a majority of selfish individuals to cooperate or, conversely, a few egoists can induce a large number of altruists to defect. Current gene-based evolutionary theories cannot explain important patterns of human altruism, pointing towards the importance of both theories of cultural evolution as well as gene–culture co-evolution.

...

Why are humans so unusual among animals in this respect? We propose that quantitatively, and probably even qualitatively, unique patterns of human altruism provide the answer to this question. Human altruism goes far beyond that which has been observed in the animal world. Among animals, fitness-reducing acts that confer fitness benefits on other individuals are largely restricted to kin groups; despite several decades of research, evidence for reciprocal altruism in pair-wise repeated encounters4,5 remains scarce6–8. Likewise, there is little evidence so far that individual reputation building affects cooperation in animals, which contrasts strongly with what we find in humans. If we randomly pick two human strangers from a modern society and give them the chance to engage in repeated anonymous exchanges in a laboratory experiment, there is a high probability that reciprocally altruistic behaviour will emerge spontaneously9,10.

However, human altruism extends far beyond reciprocal altruism and reputation-based cooperation, taking the form of strong reciprocity11,12. Strong reciprocity is a combination of altruistic rewarding, which is a predisposition to reward others for cooperative, norm-abiding behaviours, and altruistic punishment, which is a propensity to impose sanctions on others for norm violations. Strong reciprocators bear the cost of rewarding or punishing even if they gain no individual economic benefit whatsoever from their acts. In contrast, reciprocal altruists, as they have been defined in the biological literature4,5, reward and punish only if this is in their long-term self-interest. Strong reciprocity thus constitutes a powerful incentive for cooperation even in non-repeated interactions and when reputation gains are absent, because strong reciprocators will reward those who cooperate and punish those who defect.

...

We will show that the interaction between selfish and strongly reciprocal … [more]
concept  conceptual-vocab  wiki  reference  article  models  GT-101  game-theory  anthropology  cultural-dynamics  trust  cooperate-defect  coordination  iteration-recursion  sequential  axelrod  discrete  smoothness  evolution  evopsych  EGT  economics  behavioral-econ  sociology  new-religion  deep-materialism  volo-avolo  characterization  hsu  scitariat  altruism  justice  group-selection  decision-making  tribalism  organizing  hari-seldon  theory-practice  applicability-prereqs  bio  finiteness  multi  history  science  social-science  decision-theory  commentary  study  summary  giants  the-trenches  zero-positive-sum  🔬  bounded-cognition  info-dynamics  org:edge  explanation  exposition  org:nat  eden  retention  long-short-run  darwinian  markov  equilibrium  linear-algebra  nitty-gritty  competition  war  explanans  n-factor  europe  the-great-west-whale  occident  china  asia  sinosphere  orient  decentralized  markets  market-failure  cohesion  metabuch  stylized-facts  interdisciplinary  physics  pdf  pessimism  time  insight  the-basilisk  noblesse-oblige  the-watchers  ideas  l 
march 2018 by nhaliday
[1709.06560] Deep Reinforcement Learning that Matters
https://twitter.com/WAWilsonIV/status/912505885565452288
I’ve been experimenting w/ various kinds of value function approaches to RL lately, and its striking how primitive and bad things seem to be
At first I thought it was just that my code sucks, but then I played with the OpenAI baselines and nope, it’s the children that are wrong.
And now, what comes across my desk but this fantastic paper: (link: https://arxiv.org/abs/1709.06560) arxiv.org/abs/1709.06560 How long until the replication crisis hits AI?

https://twitter.com/WAWilsonIV/status/911318326504153088
Seriously I’m not blown away by the PhDs’ records over the last 30 years. I bet you’d get better payoff funding eccentrics and amateurs.
There are essentially zero fundamentally new ideas in AI, the papers are all grotesquely hyperparameter tuned, nobody knows why it works.

Deep Reinforcement Learning Doesn't Work Yet: https://www.alexirpan.com/2018/02/14/rl-hard.html
Once, on Facebook, I made the following claim.

Whenever someone asks me if reinforcement learning can solve their problem, I tell them it can’t. I think this is right at least 70% of the time.
papers  preprint  machine-learning  acm  frontier  speedometer  deep-learning  realness  replication  state-of-art  survey  reinforcement  multi  twitter  social  discussion  techtariat  ai  nibble  org:mat  unaffiliated  ratty  acmtariat  liner-notes  critique  sample-complexity  cost-benefit  todo 
september 2017 by nhaliday
Difference between off-policy and on-policy learning - Cross Validated
The reason that Q-learning is off-policy is that it updates its Q-values using the Q-value of the next state s′ and the greedy action a′. In other words, it estimates the return (total discounted future reward) for state-action pairs assuming a greedy policy were followed despite the fact that it's not following a greedy policy.

The reason that SARSA is on-policy is that it updates its Q-values using the Q-value of the next state s′ and the current policy's action a″. It estimates the return for state-action pairs assuming the current policy continues to be followed.

The distinction disappears if the current policy is a greedy policy. However, such an agent would not be good since it never explores.
q-n-a  overflow  machine-learning  acm  reinforcement  confusion  jargon  generalization  nibble  definition  greedy  comparison 
february 2017 by nhaliday

bundles : abstractacademeacmframe

related tags

accretion  acm  acmtariat  advanced  adversarial  agriculture  ai  ai-control  akrasia  alignment  altruism  analytical-holistic  anthropology  antiquity  applicability-prereqs  article  ascetic  asia  atoms  axelrod  backup  bare-hands  bayesian  behavioral-econ  ben-recht  benchmarks  biases  bio  books  bounded-cognition  broad-econ  characterization  charity  china  christianity  clever-rats  coarse-fine  cohesion  commentary  comparison  competition  concept  conceptual-vocab  conference  confusion  conquest-empire  cooperate-defect  coordination  cost-benefit  course  critique  cultural-dynamics  cybernetics  darwinian  debate  decentralized  decision-making  decision-theory  deep-learning  deep-materialism  definition  density  descriptive  detail-architecture  discrete  discussion  distribution  diversity  domestication  draft  early-modern  ecology  economics  eden  EEA  egalitarianism-hierarchy  EGT  elite  emotion  encyclopedic  envy  equilibrium  europe  events  evolution  evopsych  examples  expert  expert-experience  explanans  explanation  exposition  fall-2016  farmers-and-foragers  finiteness  flexibility  formal-values  free-riding  frontier  game-theory  generalization  giants  gnon  gnosis-logos  government  gradient-descent  graphical-models  greedy  ground-up  group-selection  GT-101  guide  guilt-shame  hari-seldon  henrich  hi-order-bits  history  hmm  honor  hsu  human-ml  ideas  ideology  illusion  incentives  individualism-collectivism  industrial-revolution  info-dynamics  init  insight  interdisciplinary  interests  intersection-connectedness  intricacy  investing  iron-age  iteration-recursion  jargon  justice  latent-variables  learning-theory  leviathan  linear-algebra  linearity  liner-notes  links  long-short-run  love-hate  machine-learning  manifolds  market-failure  markets  markov  math.DS  medieval  mediterranean  meta:rhetoric  metabuch  metameta  model-class  models  monte-carlo  multi  n-factor  neurons  new-religion  nibble  nietzschean  nips  nitty-gritty  nlp  noblesse-oblige  nonlinearity  number  occident  off-convex  openai  optimization  order-disorder  org:bleg  org:edge  org:edu  org:mat  org:nat  organizing  orient  overflow  p:**  p:***  PAC  papers  parasites-microbiome  patho-altruism  pdf  peace-violence  pessimism  phase-transition  philosophy  physics  piracy  population  pre-2013  preprint  princeton  probability  prudence  public-goodish  putnam-like  q-n-a  random  ratty  realness  reduction  reference  reflection  reinforcement  religion  replication  reputation  research  research-program  retention  ritual  robust  roots  rot  s:**  sample-complexity  sanjeev-arora  sapiens  scale  science  scitariat  search  selection  self-interest  sequential  signal-noise  signaling  simplex  sinosphere  smoothness  social  social-norms  social-science  sociality  sociology  speedometer  state-of-art  strategy  street-fighting  study  stylized-facts  summary  survey  synthesis  tech  techtariat  the-basilisk  the-classics  the-great-west-whale  the-trenches  the-watchers  theory-practice  theos  thinking  time  todo  toolkit  tribalism  trust  truth  tutorial  twitter  unaffiliated  unit  unsupervised  us-them  values  vampire-squid  visualization  volo-avolo  war  westminster  wiki  workshop  yoga  zero-positive-sum  👳  🔬 

Copy this bookmark:



description:


tags: