Minimax向后打

论文标题

Minimax向后打

Minimax Strikes Back

论文作者

Cohen-Solal, Quentin, Cazenave, Tristan

论文摘要

深度强化学习在许多完整的信息游戏中达到了超人的游戏水平。以零知识学习的最新学习算法是Alphazero。我们采用另一种方法，Athénan使用了一种称为Descent的不同，最小的，搜索算法以及不同的学习目标，并且不使用策略。我们表明，对于多个游戏，它比Alphazero的重新实现：一夫多妻制的效率要高得多。当一任gpu使用100倍（至少在某些游戏中）时，它甚至具有一层竞争力。卓越表现的关键之一是，使用Athénan生成状态数据的成本约为低296倍。凭借相同的合理浮雕，没有加强执行的Athénan至少要比一夫多妻制的速度快7倍，并且通过增强率使用的速度要快30倍以上。

Deep Reinforcement Learning reaches a superhuman level of play in many complete information games. The state of the art algorithm for learning with zero knowledge is AlphaZero. We take another approach, Athénan, which uses a different, Minimax-based, search algorithm called Descent, as well as different learning targets and that does not use a policy. We show that for multiple games it is much more efficient than the reimplementation of AlphaZero: Polygames. It is even competitive with Polygames when Polygames uses 100 times more GPU (at least for some games). One of the keys to the superior performance is that the cost of generating state data for training is approximately 296 times lower with Athénan. With the same reasonable ressources, Athénan without reinforcement heuristic is at least 7 times faster than Polygames and much more than 30 times faster with reinforcement heuristic.

下载PDF全文

下载文献需遵守相关版权规定

论文标题