bandits   259

« earlier    

[1904.10040] A Survey on Practical Applications of Multi-Armed and Contextual Bandits
"In recent years, multi-armed bandit (MAB) framework has attracted a lot of attention in various applications, from recommender systems and information retrieval to healthcare and finance, due to its stellar performance combined with certain attractive properties, such as learning from less feedback. The multi-armed bandit field is currently flourishing, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical bandit problem. This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the multi-armed bandit. Specifically, we introduce a taxonomy of common MAB-based applications and summarize state-of-art for each of those domains. Furthermore, we identify important current trends and provide new perspectives pertaining to the future of this exciting and fast-growing field."
surveys  bandits 
april 2019 by arsyed
[1505.00369] Batched bandit problems
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
bandits  batched  batch-mode 
april 2019 by arsyed
Structural Causal Bandits: Where to Intervene?
We study the problem of identifying the best action in a sequential decision-making setting when the reward distributions of the arms exhibit a non-trivial dependence structure, which is governed by the underlying causal model of the domain where the agent is deployed. In this setting, playing an arm corresponds to intervening on a set of variables and setting them to specific values. In this paper, we show that whenever the underlying causal model is not taken into account during the decision-making process, the standard strategies of simultaneously intervening on all variables or on all the subsets of the variables may, in general, lead to suboptimal policies, regardless of the number of interventions performed by the agent in the environment. We formally acknowledge this phenomenon and investigate structural properties implied by the underlying causal model, which lead to a complete characterization of the relationships between the arms' distributions. We leverage this characterization to build a new algorithm that takes as input a causal structure and finds a minimal, sound, and complete set of qualified arms that an agent should play to maximize its expected reward. We empirically demonstrate that the new strategy learns an optimal policy and leads to orders of magnitude faster convergence rates when compared with its causal-insensitive counterparts.
bandits  causality  intervention 
november 2018 by arsyed
RT : In case you missed it: this blog post is about and , but also and
science  nonsense  bandits  poetry  from twitter
november 2018 by abolibibelot
Multi-Armed Bandits, Conjugate Models and Bayesian Reinforcement Learning | Eigenfoo
Let’s talk about Bayesianism. It’s developed a reputation (not entirelyjustified, but not entirely unjustified either) for being too mathematicallysophisticated…
rl  bandits 
september 2018 by yizhexu

« earlier    

related tags

2017-01-27  2017-01-28  2017-01-29  2017-01  2017  ab-testing  abtesting  acm  acmtariat  active-learning  active-search  adaptive-sampling  adaptive_design  advanced  adversarial  advertising  ai  algorithm  algorithmic-econ  algorithms  alone  amii  amortization-potential  andreas-krause  approximation  armed  article  asr  average-case  bandit  batch-mode  batched  bayes  bayesian-optimization  bayesian  best-practices  bias  biases  bidding  bio  blog  book  books  bootstrap  brandon  causality  classification  code  combo-optimization  comp  conference  confidence  consider:looking-to-see  content  context  contextual-bandits  contextual  cool  cornell  cost  counterfactual-learning  course  courses  criminal  criminals  critique  csaba-szepesvari  data-science  data-stream  datascience  decision-making  deep-learning  deepgoog  diagnoses  diagnosis  dialogue  dimensionality  discrete  distributional  diversity  doctor  doctors  ebook  economics  emergency  engineering  ensembles  entropy-like  er  erl  estimation  ethical-algorithms  ethics  evan-miller  events  experiment  experimental-design  experiments  expert-experience  expert  explanation  exploration-exploitation  exploration  explore-exploit  exposition  extrema  face  ferris  few-shot  fidelity  flask  formalization  frontier  game-theory  games  gaussian-processes  gittins-index  gradient-descent  greedy  ground-up  hadoop  hmm  home  homepage  human-in-the-loop  hyperopt  hyperparameters  iclr-2018  iidness  init  injuries  injury  interdisciplinary  intervention  justice  kernels  learning-theory  learning  lecture-notes  lectures  library  libs  linearity  liner-notes  list  logged-data  machine-learning  machine-translation  machine  machine_learning  machinelearning  magnitude  markets  math  mawazo  mdp  memory-net  meta-learning  metabuch  mit  ml  model  models  moments  multi-armed  multi  multiarmbandits  multiclass  netflix  neural-net  neuralnets  nibble  nlu  nonsense  notes  nudge-targets  of  off-convex  online-learning  optimisation  optimization  org:bleg  org:inst  org:mat  org:nat  p:***  p:someday  papers  people  personalization  personalized  planning  poetry  pragmatic  prediction  preprint  princeton  prof  programming  python  quixotic  ranking  recommendation  recommendations  recsys  reference  reflection  regression  regret  regularization  regularizer  reinforcement-learning  reinforcement  reinforcement_learning  reinforcementlearning  research  resource  review  rhetoric  rl  room  rtb  sample-complexity  sampling  science  sebastien-bubeck  semi-bandits  slu  sparsity  speech  speedometer  spider  spotify  stanford  state-of-art  statistics  stats  stochastic-processes  stream  study  submodular  submodularity  surveys  talks  tech  technocracy  techtariat  tensorflow  testing  thesis  thompson-sampling  thompson  tim  time  titans  toolkit  tools  tor-lattimore  tot  tutorial  unit  unsupervised  vc-dimension  video  webb  wet  yoga  zero-shot  👳 

Copy this bookmark: