论文标题
策略驱动的极限定理相关的匪徒问题
Strategy-Driven Limit Theorems Associated Bandit Problems
论文作者
论文摘要
通过研究匪徒问题的渐近行为的激励,我们获得了几种策略驱动的极限定理,包括大数字定律,大偏差原理和中心极限定理。与经典限制定理不同,我们开发了采样策略驱动的极限定理,从而产生最大或最低平均奖励。大量定律确定了各种策略可实现的所有可能限制。大偏差原理为偏离限制域的偏差提供了最大的衰减概率。为了描述平均值周围的波动,我们在最佳策略下获得了策略驱动的中心限制定理。这些定理的限制是明确识别的,并在很大程度上取决于事件的结构或集成功能和策略。这证明了学习结构的主要签名。我们的结果可用于估计最大(最小)奖励,并确定在两臂匪徒问题中避免帕伦多悖论的条件。它还为确定提供较高平均奖励的ARM的统计推断奠定了理论基础。
Motivated by the study of asymptotic behaviour of the bandit problems, we obtain several strategy-driven limit theorems including the law of large numbers, the large deviation principle, and the central limit theorem. Different from the classical limit theorems, we develop sampling strategy-driven limit theorems that generate the maximum or minimum average reward. The law of large numbers identifies all possible limits that are achievable under various strategies. The large deviation principle provides the maximum decay probabilities for deviations from the limiting domain. To describe the fluctuations around averages, we obtain strategy-driven central limit theorems under optimal strategies. The limits in these theorem are identified explicitly, and depend heavily on the structure of the events or the integrating functions and strategies. This demonstrates the key signature of the learning structure. Our results can be used to estimate the maximal (minimal) rewards, and to identify the conditions of avoiding the Parrondo's paradox in the two-armed bandit problem. It also lays the theoretical foundation for statistical inference in determining the arm that offers the higher mean reward.