通才神经算法学习者

论文标题

通才神经算法学习者

A Generalist Neural Algorithmic Learner

论文作者

Ibarz, Borja, Kurin, Vitaly, Papamakarios, George, Nikiforou, Kyriacos, Bennani, Mehdi, Csordás, Róbert, Dudzik, Andrew, Bošnjak, Matko, Vitvitskyi, Alex, Rubanova, Yulia, Deac, Andreea, Bevilacqua, Beatrice, Ganin, Yaroslav, Blundell, Charles, Veličković, Petar

论文摘要

神经算法推理的基石是解决算法任务的能力，尤其是以一种概括分布的方式。尽管近年来，该领域的方法学改进激增，但它们主要集中在建立专家模型上。专业模型能够学习仅通过具有相同控制流骨干的算法来神经执行一种算法或算法集合。相反，在这里，我们专注于构建通才神经算法学习者 - 一个单个图形神经网络处理器，能够学习执行各种算法，例如分类，搜索，动态编程，路径找到和几何学。我们利用CLRS基准来凭经验表明，就像在感知领域的最新成功一样，通才算法学习者可以通过“合并”知识来构建。也就是说，只要我们可以学会在单个任务制度中很好地执行它们，就可以以多任务方式有效地学习算法。在此激励的情况下，我们为CLR提供了一系列改进，对CLR的输入表示，培训制度和处理器体系结构，将平均单任务性能提高了20％以上。然后，我们进行了多任务学习者的彻底消融，以利用这些改进。我们的结果表明，一位通才学习者有效地结合了专家模型所捕获的知识。

The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms with identical control-flow backbone. Here, instead, we focus on constructing a generalist neural algorithmic learner -- a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. We leverage the CLRS benchmark to empirically show that, much like recent successes in the domain of perception, generalist algorithmic learners can be built by "incorporating" knowledge. That is, it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime. Motivated by this, we present a series of improvements to the input representation, training regime and processor architecture over CLRS, improving average single-task performance by over 20% from prior art. We then conduct a thorough ablation of multi-task learners leveraging these improvements. Our results demonstrate a generalist learner that effectively incorporates knowledge captured by specialist models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题