nhaliday + acm + definition   15

Correlated Equilibria in Game Theory | Azimuth
Given this, it’s not surprising that Nash equilibria can be hard to find. Last September a paper came out making this precise, in a strong way:

• Yakov Babichenko and Aviad Rubinstein, Communication complexity of approximate Nash equilibria.

The authors show there’s no guaranteed method for players to find even an approximate Nash equilibrium unless they tell each other almost everything about their preferences. This makes finding the Nash equilibrium prohibitively difficult to find when there are lots of players… in general. There are particular games where it’s not difficult, and that makes these games important: for example, if you’re trying to run a government well. (A laughable notion these days, but still one can hope.)

Klarreich’s article in Quanta gives a nice readable account of this work and also a more practical alternative to the concept of Nash equilibrium. It’s called a ‘correlated equilibrium’, and it was invented by the mathematician Robert Aumann in 1974. You can see an attempt to define it here:
baez  org:bleg  nibble  mathtariat  commentary  summary  news  org:mag  org:sci  popsci  equilibrium  GT-101  game-theory  acm  conceptual-vocab  concept  definition  thinking  signaling  coordination  tcs  complexity  communication-complexity  lower-bounds  no-go  liner-notes  big-surf  papers  research  algorithmic-econ  volo-avolo 
july 2017 by nhaliday
Mixing (mathematics) - Wikipedia
One way to describe this is that strong mixing implies that for any two possible states of the system (realizations of the random variable), when given a sufficient amount of time between the two states, the occurrence of the states is independent.

Mixing coefficient is
α(n) = sup{|P(A∪B) - P(A)P(B)| : A in σ(X_0, ..., X_{t-1}), B in σ(X_{t+n}, ...), t >= 0}
for σ(...) the sigma algebra generated by those r.v.s.

So it's a notion of total variational distance between the true distribution and the product distribution.
concept  math  acm  physics  probability  stochastic-processes  definition  mixing  iidness  wiki  reference  nibble  limits  ergodic  math.DS  measure  dependence-independence 
february 2017 by nhaliday
Difference between off-policy and on-policy learning - Cross Validated
The reason that Q-learning is off-policy is that it updates its Q-values using the Q-value of the next state s′ and the greedy action a′. In other words, it estimates the return (total discounted future reward) for state-action pairs assuming a greedy policy were followed despite the fact that it's not following a greedy policy.

The reason that SARSA is on-policy is that it updates its Q-values using the Q-value of the next state s′ and the current policy's action a″. It estimates the return for state-action pairs assuming the current policy continues to be followed.

The distinction disappears if the current policy is a greedy policy. However, such an agent would not be good since it never explores.
q-n-a  overflow  machine-learning  acm  reinforcement  confusion  jargon  generalization  nibble  definition  greedy  comparison 
february 2017 by nhaliday

bundles : academeacmframemathmeta

related tags

acm  acmtariat  algorithmic-econ  atoms  baez  bias-variance  big-surf  bits  characterization  clarity  coding-theory  commentary  communication-complexity  comparison  complexity  composition-decomposition  concentration-of-measure  concept  conceptual-vocab  confusion  convergence  convexity-curvature  coordination  correlation  curiosity  data-science  decision-making  decision-theory  deep-learning  definition  dependence-independence  distribution  economics  entropy-like  equilibrium  ergodic  existence  expectancy  explanation  exploratory  exposition  extrema  features  finance  formal-values  frontier  game-theory  generalization  giants  greedy  GT-101  identity  iidness  information-theory  insurance  investing  jargon  learning-theory  lecture-notes  levers  limits  linearity  liner-notes  list  lower-bounds  machine-learning  martingale  math  math.DS  mathtariat  measure  methodology  metrics  micro  mit  mixing  ML-MAP-E  model-class  models  moments  motivation  multi  multiplicative  news  nibble  no-go  nonparametric  ocw  oly  ORFE  org:bleg  org:mag  org:sci  outcome-risk  overflow  papers  parametric  pdf  physics  pic  plots  popsci  princeton  probability  proofs  properties  q-n-a  qra  reference  reinforcement  research  research-program  rigor  s:*  sequential  signaling  simplex  soft-question  sparsity  stats  stochastic-processes  structure  summary  symmetry  tails  tcs  thinking  tidbits  uniqueness  unsupervised  values  visual-understanding  volo-avolo  von-neumann  wiki 

Copy this bookmark: