论文标题

部分可观测时空混沌系统的无模型预测

I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization

论文作者

Geng, Tong, Wu, Chunshu, Zhang, Yongan, Tan, Cheng, Xie, Chenhao, You, Haoran, Herbordt, Martin C., Lin, Yingyan, Li, Ang

论文摘要

在过去三年中,图形卷积网络(GCN)引起了极大的关注。与其他深度学习方式相比,GCN的高性能硬件加速同样至关重要,但更具挑战性。障碍源于较差的数据局部性和由于较大的尺寸,较高的稀疏性和不规则的非零分布而导致的冗余计算。 在本文中,我们为GCN推断提出了一种新型的硬件加速器,称为I-GCN,可显着改善数据局部性并减少不必要的计算。该机制是一种新的在线图重组算法,我们称为岛化。所提出的算法找到具有强烈内部但较弱的外部连接的节点簇。岛化过程产生了两个主要好处。首先,通过处理岛屿而不是单个节点,芯片数据重复使用更好,较少的芯片内存储器访问权限较少。其次,由于可以重复使用岛屿中常见/共享邻居的聚合,因此冗余的计算较少。图岛的并行搜索,标识和杠杆作用均在运行时纯粹用于硬件,以增量管道工作。这是完成图形数据的任何预处理或GCN模型结构的调整而完成的。 实验结果表明,I-GCN可以显着降低片外访问,并修剪38%的聚合操作,从而导致CPU,GPU,GCN先前的ART GCN加速器分别为5549X,403X和5.7倍的效果加速。

Graph Convolutional Networks (GCNs) have drawn tremendous attention in the past three years. Compared with other deep learning modalities, high-performance hardware acceleration of GCNs is as critical but even more challenging. The hurdles arise from the poor data locality and redundant computation due to the large size, high sparsity, and irregular non-zero distribution of real-world graphs. In this paper we propose a novel hardware accelerator for GCN inference, called I-GCN, that significantly improves data locality and reduces unnecessary computation. The mechanism is a new online graph restructuring algorithm we refer to as islandization. The proposed algorithm finds clusters of nodes with strong internal but weak external connections. The islandization process yields two major benefits. First, by processing islands rather than individual nodes, there is better on-chip data reuse and fewer off-chip memory accesses. Second, there is less redundant computation as aggregation for common/shared neighbors in an island can be reused. The parallel search, identification, and leverage of graph islands are all handled purely in hardware at runtime working in an incremental pipeline. This is done without any preprocessing of the graph data or adjustment of the GCN model structure. Experimental results show that I-GCN can significantly reduce off-chip accesses and prune 38% of aggregation operations, leading to performance speedups over CPUs, GPUs, the prior art GCN accelerators of 5549x, 403x, and 5.7x on average, respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源