论文标题
快速学习在线任务计划中的续订优化
Fast Learning for Renewal Optimization in Online Task Scheduling
论文作者
论文摘要
本文考虑了更新奖励系统的在线优化。控制器可以背靠背执行一系列任务。每个任务都有一个随机的参数向量,称为任务类型向量,它会影响任务处理选项,还会影响任务的奖励和时间持续时间。任务类型向量的概率分布尚不清楚,并且控制器必须学会做出有效的决策,以便时间平均奖励会收敛到最佳性。此类更新优化问题的先前工作使最佳收敛时间的问题打开了问题。本文开发了一种具有最佳差距的算法,该算法像$ o(1/\ sqrt {k})$一样衰减,其中$ k $是处理的任务数量。当系统满足强大的凹陷属性时,相同的算法显示具有更快的$ o(\ log(k)/k)$性能。所提出的算法使用辅助变量,该变量根据经典的Robbins-Monro迭代进行更新。它根据此变量和观察到的任务类型在每个续订框架的开头做出在线调度决策。通过构建一个示例系统,所有算法的性能最多$ω(\ log(k)/k)$,可以为强凹形案例获得匹配的匡威。还显示了匹配的$ω(1/\ sqrt {k})$ converse,也显示了一般情况,没有强烈的凹面。
This paper considers online optimization of a renewal-reward system. A controller performs a sequence of tasks back-to-back. Each task has a random vector of parameters, called the task type vector, that affects the task processing options and also affects the resulting reward and time duration of the task. The probability distribution for the task type vector is unknown and the controller must learn to make efficient decisions so that time average reward converges to optimality. Prior work on such renewal optimization problems leaves open the question of optimal convergence time. This paper develops an algorithm with an optimality gap that decays like $O(1/\sqrt{k})$, where $k$ is the number of tasks processed. The same algorithm is shown to have faster $O(\log(k)/k)$ performance when the system satisfies a strong concavity property. The proposed algorithm uses an auxiliary variable that is updated according to a classic Robbins-Monro iteration. It makes online scheduling decisions at the start of each renewal frame based on this variable and on the observed task type. A matching converse is obtained for the strongly concave case by constructing an example system for which all algorithms have performance at best $Ω(\log(k)/k)$. A matching $Ω(1/\sqrt{k})$ converse is also shown for the general case without strong concavity.