论文标题
部分可观测时空混沌系统的无模型预测
Some performance considerations when using multi-armed bandit algorithms in the presence of missing data
论文作者
论文摘要
在比较多臂匪徒算法的性能时,通常会忽略缺失数据的潜在影响。在实践中,这也影响了他们的实现,在克服这一点的最简单方法是继续根据原始的强盗算法进行采样,而忽略了缺失的结果。我们通过广泛的模拟研究研究了对这种方法的性能的影响,以处理几种匪徒算法的丢失数据,假设奖励是随机缺失的。在患者分配的背景下,我们专注于具有相对较小样本量的临床试验的两臂匪徒二进制算法。但是,我们的结果适用于预计丢失数据的Bandit算法的其他应用。我们评估了由此产生的运营特征,包括预期奖励。考虑到双臂失踪的不同概率。我们工作的关键发现是,当使用忽略丢失数据的最简单策略时,对多臂匪徒策略的预期性能的影响会根据这些策略平衡勘探探索探索权衡的方式而有所不同。旨在探索的算法继续将样本分配给手臂,而响应却更多(被视为具有较少观察到的信息的手臂,算法比其他算法更具吸引力)。相反,针对剥削的算法将迅速为来自手臂的样品迅速分配高价值,而当前高平均值的算法如何,而无需每只手臂的水平观测。此外,对于更多关注探索的算法,我们说明可以使用简单的平均插补方法来缓解缺失响应的问题。
When comparing the performance of multi-armed bandit algorithms, the potential impact of missing data is often overlooked. In practice, it also affects their implementation where the simplest approach to overcome this is to continue to sample according to the original bandit algorithm, ignoring missing outcomes. We investigate the impact on performance of this approach to deal with missing data for several bandit algorithms through an extensive simulation study assuming the rewards are missing at random. We focus on two-armed bandit algorithms with binary outcomes in the context of patient allocation for clinical trials with relatively small sample sizes. However, our results apply to other applications of bandit algorithms where missing data is expected to occur. We assess the resulting operating characteristics, including the expected reward. Different probabilities of missingness in both arms are considered. The key finding of our work is that when using the simplest strategy of ignoring missing data, the impact on the expected performance of multi-armed bandit strategies varies according to the way these strategies balance the exploration-exploitation trade-off. Algorithms that are geared towards exploration continue to assign samples to the arm with more missing responses (which being perceived as the arm with less observed information is deemed more appealing by the algorithm than it would otherwise be). In contrast, algorithms that are geared towards exploitation would rapidly assign a high value to samples from the arms with a current high mean irrespective of the level observations per arm. Furthermore, for algorithms focusing more on exploration, we illustrate that the problem of missing responses can be alleviated using a simple mean imputation approach.