论文标题

用于将应用程序映射到基于数据流的粗粒重新配置阵列的强化学习方法

Reinforcement Learning Approach for Mapping Applications to Dataflow-Based Coarse-Grained Reconfigurable Array

论文作者

Chang, Andre Xian Ming, Khopkar, Parth, Romanous, Bashar, Chaurasia, Abhishek, Estep, Patrick, Windh, Skyler, Vanesko, Doug, Mohideen, Sheik Dawood Beer, Culurciello, Eugenio

论文摘要

流媒体引擎(SE)是一个粗粒的可重新配置阵列,可提供编程灵活性和具有能效的高性能。在SE上执行的应用程序表示为同步数据流(SDF)图的组合,其中每个指令表示为节点。每个节点都需要映射到右侧的插槽并在SE中的数组,以确保程序的正确执行。这在一个庞大而稀疏的搜索空间中创建了一个优化问题,该空间是不切实际的,因为它需要专业知识和SE Micro架构的知识。在这项工作中,我们建议使用全局图(GGA)模块(GGA)模块和无效位置的输出掩蔽,以查找和优化指令时间表。我们使用近端策略优化来训练模型,该模型基于建模SE设备及其约束的奖励功能,将操作置于SE瓷砖中。 GGA模块由图形神经网络和注意模块组成。图神经网络创建了SDF的嵌入,并且注意力块用于对顺序操作放置进行建模。我们展示了如何将某些工作负载映射到SE以及影响映射质量的因素。我们发现,GGA平均而言,根据所采集的总时钟周期和掩盖掩盖的奖励提高了20%的奖励。

The Streaming Engine (SE) is a Coarse-Grained Reconfigurable Array which provides programming flexibility and high-performance with energy efficiency. An application program to be executed on the SE is represented as a combination of Synchronous Data Flow (SDF) graphs, where every instruction is represented as a node. Each node needs to be mapped to the right slot and array in the SE to ensure the correct execution of the program. This creates an optimization problem with a vast and sparse search space for which finding a mapping manually is impractical because it requires expertise and knowledge of the SE micro-architecture. In this work we propose a Reinforcement Learning framework with Global Graph Attention (GGA) module and output masking of invalid placements to find and optimize instruction schedules. We use Proximal Policy Optimization in order to train a model which places operations into the SE tiles based on a reward function that models the SE device and its constraints. The GGA module consists of a graph neural network and an attention module. The graph neural network creates embeddings of the SDFs and the attention block is used to model sequential operation placement. We show results on how certain workloads are mapped to the SE and the factors affecting mapping quality. We find that the addition of GGA, on average, finds 10% better instruction schedules in terms of total clock cycles taken and masking improves reward obtained by 20%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源