nhaliday + acm + definition   15

Correlated Equilibria in Game Theory | Azimuth
Given this, it’s not surprising that Nash equilibria can be hard to find. Last September a paper came out making this precise, in a strong way:

• Yakov Babichenko and Aviad Rubinstein, Communication complexity of approximate Nash equilibria.

The authors show there’s no guaranteed method for players to find even an approximate Nash equilibrium unless they tell each other almost everything about their preferences. This makes finding the Nash equilibrium prohibitively difficult to find when there are lots of players… in general. There are particular games where it’s not difficult, and that makes these games important: for example, if you’re trying to run a government well. (A laughable notion these days, but still one can hope.)

Klarreich’s article in Quanta gives a nice readable account of this work and also a more practical alternative to the concept of Nash equilibrium. It’s called a ‘correlated equilibrium’, and it was invented by the mathematician Robert Aumann in 1974. You can see an attempt to define it here:
baez  org:bleg  nibble  mathtariat  commentary  summary  news  org:mag  org:sci  popsci  equilibrium  GT-101  game-theory  acm  conceptual-vocab  concept  definition  thinking  signaling  coordination  tcs  complexity  communication-complexity  lower-bounds  no-go  liner-notes  big-surf  papers  research  algorithmic-econ  volo-avolo
july 2017 by nhaliday
Mixing (mathematics) - Wikipedia
One way to describe this is that strong mixing implies that for any two possible states of the system (realizations of the random variable), when given a sufficient amount of time between the two states, the occurrence of the states is independent.

Mixing coefficient is
α(n) = sup{|P(A∪B) - P(A)P(B)| : A in σ(X_0, ..., X_{t-1}), B in σ(X_{t+n}, ...), t >= 0}
for σ(...) the sigma algebra generated by those r.v.s.

So it's a notion of total variational distance between the true distribution and the product distribution.
concept  math  acm  physics  probability  stochastic-processes  definition  mixing  iidness  wiki  reference  nibble  limits  ergodic  math.DS  measure  dependence-independence
february 2017 by nhaliday
Difference between off-policy and on-policy learning - Cross Validated
The reason that Q-learning is off-policy is that it updates its Q-values using the Q-value of the next state s′ and the greedy action a′. In other words, it estimates the return (total discounted future reward) for state-action pairs assuming a greedy policy were followed despite the fact that it's not following a greedy policy.

The reason that SARSA is on-policy is that it updates its Q-values using the Q-value of the next state s′ and the current policy's action a″. It estimates the return for state-action pairs assuming the current policy continues to be followed.

The distinction disappears if the current policy is a greedy policy. However, such an agent would not be good since it never explores.
q-n-a  overflow  machine-learning  acm  reinforcement  confusion  jargon  generalization  nibble  definition  greedy  comparison
february 2017 by nhaliday

bundles : academeacmframemathmeta

Copy this bookmark:

description:

tags: