nhaliday + acm + jargon   10

Difference between off-policy and on-policy learning - Cross Validated
The reason that Q-learning is off-policy is that it updates its Q-values using the Q-value of the next state s′ and the greedy action a′. In other words, it estimates the return (total discounted future reward) for state-action pairs assuming a greedy policy were followed despite the fact that it's not following a greedy policy.

The reason that SARSA is on-policy is that it updates its Q-values using the Q-value of the next state s′ and the current policy's action a″. It estimates the return for state-action pairs assuming the current policy continues to be followed.

The distinction disappears if the current policy is a greedy policy. However, such an agent would not be good since it never explores.
q-n-a  overflow  machine-learning  acm  reinforcement  confusion  jargon  generalization  nibble  definition  greedy  comparison 
february 2017 by nhaliday
What is the difference between inference and learning? - Quora
- basically boils down to latent variables vs. (hyper-)parameters
- so computing p(x_h|x_v,θ) vs. computing p(θ|X_v)
- from a completely Bayesian perspective, no real difference
- described in more detail in [Kevin Murphy, 10.4]
q-n-a  qra  jargon  machine-learning  stats  acm  bayesian  graphical-models  latent-variables  confusion  comparison  nibble 
january 2017 by nhaliday

bundles : academeacmframemeta

related tags

acm  acmtariat  applications  backup  bayesian  bias-variance  bounded-cognition  chart  comparison  complex-systems  concentration-of-measure  concept  conceptual-vocab  confusion  contrarianism  cracker-econ  cycles  data-science  debt  deep-learning  definition  discovery  discussion  duplication  economics  econotariat  ensembles  error  events  explanation  finance  garett-jones  generalization  gotchas  graphical-models  greedy  history  hmm  housing  hsu  interdisciplinary  intricacy  jargon  journos-pundits  latent-variables  list  machine-learning  macro  marginal-rev  market-failure  markets  methodology  ML-MAP-E  model-class  moments  monte-carlo  mostly-modern  multi  nibble  nonparametric  occam  ORFE  org:bleg  outcome-risk  overflow  parametric  parsimony  physics  postmortem  presentation  probability  q-n-a  qra  rant  reflection  regression  regularization  regularizer  regulation  reinforcement  rhetoric  risk  roots  sampling  scitariat  slides  social  spearhead  stats  stochastic-processes  stream  street-fighting  synthesis  the-trenches  tidbits  twitter  usa  video 

Copy this bookmark:



description:


tags: