迈向QD-Suite：为质量多样性算法开发一组基准

论文标题

迈向QD-Suite：为质量多样性算法开发一组基准

Towards QD-suite: developing a set of benchmarks for Quality-Diversity algorithms

论文作者

Salehi, Achkan, Doncieux, Stephane

论文摘要

虽然质量多样性（QD）领域已经发展成为随机优化的独特分支，但一些问题，特别是运动和导航任务，已成为事实上的标准。这样的基准足够吗？他们是否代表QD算法面临的主要挑战？他们是否可以通过将其与他人正确解开，从而将重点放在一个特定的挑战上？它们在可伸缩性和概括方面是否具有很大的预测能力？现有的基准不是标准化的，目前尚无QD的MNIST等效物。受到强化学习学习基准的最新作品的启发，我们认为QD方法面临的挑战以及针对目标，具有挑战性，可扩展但负担得起的基准的挑战是重要的一步。作为最初的努力，我们确定了三个在稀疏奖励设置中挑战的问题，并提出了相关的基准：（1）行为度量偏差，这可能是由于使用与行为空间结构不符的指标所致。（2）具有不同特征的行为平台，使得它们需要自适应的QD算法和（3）可变性陷阱，其中基因型的较小变化会导致行为变化很大。我们建议的环境满足了上面列出的属性。

While the field of Quality-Diversity (QD) has grown into a distinct branch of stochastic optimization, a few problems, in particular locomotion and navigation tasks, have become de facto standards. Are such benchmarks sufficient? Are they representative of the key challenges faced by QD algorithms? Do they provide the ability to focus on one particular challenge by properly disentangling it from others? Do they have much predictive power in terms of scalability and generalization? Existing benchmarks are not standardized, and there is currently no MNIST equivalent for QD. Inspired by recent works on Reinforcement Learning benchmarks, we argue that the identification of challenges faced by QD methods and the development of targeted, challenging, scalable but affordable benchmarks is an important step. As an initial effort, we identify three problems that are challenging in sparse reward settings, and propose associated benchmarks: (1) Behavior metric bias, which can result from the use of metrics that do not match the structure of the behavior space. (2) Behavioral Plateaus, with varying characteristics, such that escaping them would require adaptive QD algorithms and (3) Evolvability Traps, where small variations in genotype result in large behavioral changes. The environments that we propose satisfy the properties listed above.

下载PDF全文

下载文献需遵守相关版权规定

论文标题