Sample mean based index policies with O(log n) regret for the multi-armed bandit problem
収録刊行物
-
- Adv. Appl. Prob.
-
Adv. Appl. Prob. 27 1054-1078, 1995
Adv. Appl. Prob. 27 1054-1078, 1995