加洛伊斯：通过可推广的逻辑综合来增强深入学习

论文标题

加洛伊斯：通过可推广的逻辑综合来增强深入学习

GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis

论文作者

Cao, Yushi, Li, Zhiming, Yang, Tianpei, Zhang, Hao, Zheng, Yan, Li, Yi, Hao, Jianye, Liu, Yang

论文摘要

尽管在人级控制问题中取得了卓越的表现，但与人类不同，深度强化学习（DRL）缺乏高阶智力（例如，逻辑扣除和重用），因此与人类在复杂问题中的学习和概括相比，它的行为效率低下。以前的工作试图将白框逻辑程序直接合成为DRL策略，从而表现出逻辑驱动的行为。但是，大多数综合方法都是建立在命令或声明性编程上的，并且分别具有独特的限制。前者忽略了合成过程中的原因效果逻辑，从而导致跨任务的普遍性较低。后者严格基于证明，因此无法通过复杂的层次逻辑合成程序。在本文中，我们将上述两个范式组合在一起，并提出一个新颖的可推广逻辑合成（GALOIS）框架，以合成层次结构和严格的因果逻辑程序。 Galois利用程序草图并定义了一种新的基于草图的混合程序语言来指导综合。基于此，Galois提出了一种基于草图的程序合成方法，以自动生成具有可推广且可解释的因果效果逻辑的白框程序。对具有复杂逻辑的各种决策任务的广泛评估表明，Galois优于主流基准在不同环境之间在渐近性能，可推广性和良好的知识可重复使用性方面的优越性。

Despite achieving superior performance in human-level control problems, unlike humans, deep reinforcement learning (DRL) lacks high-order intelligence (e.g., logic deduction and reuse), thus it behaves ineffectively than humans regarding learning and generalization in complex problems. Previous works attempt to directly synthesize a white-box logic program as the DRL policy, manifesting logic-driven behaviors. However, most synthesis methods are built on imperative or declarative programming, and each has a distinct limitation, respectively. The former ignores the cause-effect logic during synthesis, resulting in low generalizability across tasks. The latter is strictly proof-based, thus failing to synthesize programs with complex hierarchical logic. In this paper, we combine the above two paradigms together and propose a novel Generalizable Logic Synthesis (GALOIS) framework to synthesize hierarchical and strict cause-effect logic programs. GALOIS leverages the program sketch and defines a new sketch-based hybrid program language for guiding the synthesis. Based on that, GALOIS proposes a sketch-based program synthesis method to automatically generate white-box programs with generalizable and interpretable cause-effect logic. Extensive evaluations on various decision-making tasks with complex logic demonstrate the superiority of GALOIS over mainstream baselines regarding the asymptotic performance, generalizability, and great knowledge reusability across different environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题