部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Discovering Evolution Strategies via Meta-Black-Box Optimization

论文作者

Lange, Robert Tjarko, Schaul, Tom, Chen, Yutian, Zahavy, Tom, Dallibard, Valentin, Lu, Chris, Singh, Satinder, Flennerhag, Sebastian

论文摘要

在不访问梯度的情况下，优化功能是黑框方法（例如进化策略）的份额。虽然高度笼统，但他们的学习动力通常是启发式且僵化的时期 - 元学习可以解决的局限性。因此，我们建议通过元学习发现进化策略的有效更新规则。具体而言，我们的方法采用了通过基于自我注意力的架构参数参数的搜索策略，该策略保证了更新规则是候选解决方案的订购。我们表明，在一小部分代表性的低维分析优化问题上，将该系统进化，足以发现能够概括以看不见优化问题，人口规模和优化范围的新的进化策略。此外，相同的学习进化策略可以在监督和持续的控制任务上胜过建立的神经进化基线。作为额外的贡献，我们消融方法的各个神经网络组成部分；将学习的策略反向工程师为一种明确的启发式形式，该形式仍然具有很高的竞争力；并证明可以自我参考从头开始训练进化策略，而学习的更新规则用于驱动外部元学习环。

Optimizing functions without access to gradients is the remit of black-box methods such as evolution strategies. While highly general, their learning dynamics are often times heuristic and inflexible - exactly the limitations that meta-learning can address. Hence, we propose to discover effective update rules for evolution strategies via meta-learning. Concretely, our approach employs a search strategy parametrized by a self-attention-based architecture, which guarantees the update rule is invariant to the ordering of the candidate solutions. We show that meta-evolving this system on a small set of representative low-dimensional analytic optimization problems is sufficient to discover new evolution strategies capable of generalizing to unseen optimization problems, population sizes and optimization horizons. Furthermore, the same learned evolution strategy can outperform established neuroevolution baselines on supervised and continuous control tasks. As additional contributions, we ablate the individual neural network components of our method; reverse engineer the learned strategy into an explicit heuristic form, which remains highly competitive; and show that it is possible to self-referentially train an evolution strategy from scratch, with the learned update rule used to drive the outer meta-learning loop.

下载PDF全文

下载文献需遵守相关版权规定

论文标题